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ABSTRACT 


An  empirical  orthogonal  function  (EOF)  representation  of  relative  vorticity  is  used 
to  forecast  recurvature  (change  in  storm  heading  from  west  to  east  of 000°  N)  of  western 
North  Pacific  tropical  cyclones.  The  time-dependent  coefficients  of  the  first  and  second 
EOF  eigenvectors  vary  in  a  systematic  manner  as  the  tropical  cyclone  recurves  arou.  d 
the  subtropical  ridge  and  tend  to  cluster  about  the  same  values  at  recurvature  time.  In 
contrast,  the  coefficients  for  straight-moving  storms  tend  to  cluster  in  a  diFcrent  region 
in  EOF  space.  Exploiting  this  Euclidean  distance  approach,  additional  EOF  coefficients 
are  identified  that  best  represent  the  vorticity  fields  of  recurving  and  straight-moving 
storms.  Classification  of  an  individual  case  is  then  into  the  closest  time-to-recurvature 
in  12-h  intervals  or  straight-moving  storm  category  as  measured  in  multidimensional 
EOF  space.  Although  rather  subjective,  the  Fuclidean  method  demonstrates  skill  rela¬ 
tive  to  climatological  forecasts.  A  more  objeciive  discriminant  analysis  technique  is  also 
tested.  A  final  version  that  involves  the  first  six  EOF  coefficients  of  the  250  mb  vorticity 
field  is  useful  (72%  correct)  in  identifying  recurvcrs  or  straight-movers  during  the  72-h 
forecast  period.  Skill  in  classifying  situations  within  12-h  time-to-recurvature  groups  is 
low,  but  might  be  improved  using  other  analysis  techniques  or  in  combination  with  other 
predictors. 
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Tropical  cyclones  have  formidable  destructive  power  and  annually  exact  tremendous 
losses  in  lives  and  property.  The  western  North  Pacific  Ocean  is  the  most  active  tropical 
cyclone  basin  in  the  world.  An  average  of  31  tropical  cyclones  have  occurred  annually 
during  the  25-year  period  ending  in  1984  (ATCR  1984).  The  damage  from  these  storms 
can  be  minimized  only  through  preparedness  and  avoidance.  Precautionary  measures 
can  require  considerable  time.  Therefore,  accurate  storm  forecasts  are  critically  impor¬ 
tant  to  both  the  military  and  civilian  communities. 


A.  BACKGROUND 


Tropical  cyclones  can  be  classified  into  three  broad  categories  based  on  their  track. 
If  a  storm  moves  west  or  northwest  throughout  its  life,  it  is  classified  as  a  straight-mover 
(TV  Agnes  in  Fig.  1).  A  storm  that  turns  from  a  westward  or  northwestward  path 
through  North  to  a  northeastward  track  is  defined  as  a  recurver  (ST  Vanessa  in  Fig.  1). 
Storms  that  do  not  fit  either  the  straight-mover  or  the  recurver  categories  are  classified 
as  odd-movers  (ST  Bill  in  Fig.  1).  Odd-mo\er  tracks  are  typically  erratic  and  may  dis¬ 
play  loops  or  a  stairstep-type  track.  The  largest  forecast  errors  occur  when  recurving 
storms  had  been  forecast  to  move  straight  toward  the  west  or  northwest,  or  when 
straight-movers  had  been  forecast  to  recurve  to  the  north  or  northeast.  Incorrect  rc- 
curvatuic  forecasts  result  in  72-h  truck  forecast  errors  of  over  1850  km  (1000  n  mi)  al¬ 
most  even'  year  (Sandgathe  1987).  Situations  associated  with  recurvaturc,  due  either  to 
cyclone-midlatitude  trough  interaction  or  to  cyclone-subtropical  ridge  interaction,  are 
listed  among  the  Joint  Typhoon  Warning  Center's  (JTWC's)  most  difficult  forecast 
problems  (Sandgathe  1987).  Since  nearly  half  of  all  western  North  Pacific  tropical 
cyclones  eventually  recurve,  these  recurvature  forecast  questions  are  frequently  faced  by 
operational  forecasters. 

None  of  the  present  objective  fmecast  aids  in  operational  use  are  specifically  de¬ 
signed  to  identify  recurvature  situations.  Leftwich  (1979)  and  Lagc  (1982)  used  re¬ 
gression  analysis  techniques  to  predict  recurvature,  which  they  defined  as  a  net 
displacement  north  of  315°  during  the  forecast  period.  Leftwich  (1979)  included  posi- 
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probability  of  recurvature  in  Atlantic  tropical  cyclones.  Geopotential  heights  were  re¬ 
presented  by  gridpoint  values  on  a  relocatable  storm-ecnteied  grid.  Leftwich  concluded 
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that  the  inclusion  of  synoptic  predictors  improved  the  model  forecast  skill,  but  none  of 
his  statistical  models  out-performed  climatological  forecasts.  Lage  (1982)  used  an  em¬ 
pirical  orthogonal  function  (EOF)  representation  of  500  mb  geopotential  height  fields 
plus  persistence-related  variables  to  predict  western  North  Pacific  tropical  cyclone  re¬ 
curvature  or  non-recurvature  at  36-,  54-  and  72-h  forecast  intervals.  The  combination 
of  persistence  plus  EOF  predictors  consistently  out-performed  the  persistence  alone  or 
the  EOF  predictors  only  methods.  Each  of  these  three  techniques  was  superior  to 
climatology  and  chance  at  all  forecast  times. 

The  purpose  of  this  study  is  to  test  the  feasibility  of  using  an  EOF  representation 
of  the  synoptic  vorticity  fields  at  700,  400  and  250  mb  to  identify  recurvature  situations 
in  western  North  Pacific  tropical  cyclones.  Because  horizontal  pressure  gradients  arc 
generally  weak  and  geostrophic  relationships  deteriorate  in  the  tropics,  gcopotential 
heights  provide  a  poor  estimate  of  the  steering  flow.  Since  vorticity  combines  the 
steering  eltccts  oi  both  zonal  and  meridional  winds,  it  should  provide  a  more  accurate 
measure  of  steering  with  fewer  predictors  than  would  be  required  if  the  two  components 
of  the  wind  were  used  separately  as  predictors.  An  EOF-'  representation  of  a  synoptic 
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field  such  as  vorticitv  offers  several  important  advantages  over  gridpoint  values.  Because 
EOF  predictors  represent  spatial  patterns  in  environmental  fields,  they  contain  more 
synoptic  information  and  are  less  affected  by  observational  errors.  Because  relatively 
few  EOF  predictors  are  required  to  represent  large  amounts  of  variance  in  synoptic 
patterns,  considerable  savings  in  computer  storage  and  forecast  model  run  times  can  be 
realized  using  this  method. 

EOF  predictors  have  been  used  successfully  in  statistical-synoptic  models  to  forecast 
tropical  cyclone  motion  (Shaffer  and  Elsberry  1982;  Peak  et  al.  1986;  Schott  et  al.  1987; 
and  Elsberry  et  al.  1988).  Shaffer  (1982)  demonstrated  the  usefulness  of  EOF  represen¬ 
tation  of  500  mb  geopotential  heights  as  synoptic  forcing  predictors  in  statistical- 
synoptic  track  prediction  schemes.  In  a  similar  study,  Wilson  (1984)  used  EOF 
representation  of  700,  400  and  250  mb  wind  component  fields  to  forecast  tropical 
cy  clone  motion.  Schott  (1985)  stratified  forecast  situations  by  the  cyclone  direction  of 
motion  to  develop  a  statistical  adjustment  scheme  involving  EOF  predictors  that  reduced 
the  systematic  errors  in  a  dynamical  track  prediction  model.  Meanor  (19S7)  used 
Schott  s  stratification  scheme  and  EOF  predictors  of  vertical  wind  shear  to  develop  a 
similar  model  to  adjust  for  systematic  errors  in  a  dynamical  track  prediction  model. 
Weniger  ( 19S7 >  adopted  Meanor' s  EOF  predictors  of  vertical  wind  shear  to  develop  a 
successful  tropical  cyclone  intensity  forecast  model.  Gunzelman  (1990)  used  the  EOF 
approach  as  a  filtei  to  represent  the  "signal"  in  the  \ orticity  field,  and  suggested  that 
se\eral  different  forecast  situations  could  be  interpreted  as  an  adveetion  of  these  filtered 
\  orticity  fields. 

B.  OBJECTIVE 

1  he  objective  of  this  study  is  to  demonstrate  the  ability  of  an  EOF  representation 
of  the  synoptic  vorticity  field  to  identify  potential  recurvature  situations.  The  hypothesis 
is  that  the  adjacent  synoptic  features  cause  the  turning  motion  that  leads  to  tropical 
cyclone  recurvature.  Consequently,  the  sets  of  EOF  coefficients  for  the  vorticitv  fields 
associated  with  recurvaturc  should  be  different  from  those  associated  with  straight-track 
situations.  The  question  is,  how  far  in  advance  of  recurvaturc  arc  the  recurvature  EOF 
coefficients  distinguishable  from  the  straight-track  EOF  coefficients?  Classification 
goals  arc  two-fold:  first,  to  identify  the  overall  track  type  as  a  recurvcr  versus  a 
straight-mover;  and  second,  to  identify  the  time  to  recurvaturc  with  the  best  possible 
time  resolution.  Recurvaturc  is  defined  here  as  the  time  when  the  storm  heading  changes 
from  west  of  0UU°  North  to  cast  of  000°  North.  A  track  segment  will  be  classified  as  a 
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straight-mover  if  the  storm  does  not  recurve  during  the  next  72  h,  which  corresponds  to 
the  official  JTWC  forecast  period.  The  time  to  recurvature  will  be  specified  in  12-h  in¬ 
crements.  In  summary,  the  first  goal  of  the  study  is  to  determine  whether  the  present 
vorticity  field  is  representative  of  a  recurvature  situation  within  72  h  versus  that  of  a 
straight-mover;  if  so,  the  second  goal  is  to  specify  the  most  likely  time  to  recurvature. 

Two  methods  are  used  to  develop  the  classification  model.  In  the  Euclidean  distance 
approach,  classifications  are  into  the  group  that  has  the  closest  mean  EOF  predictor 
values  as  measured  in  multidimensional  space.  This  simple  approach  provides  physical 
insight  into  the  classification  problem.  The  difficulty  is  in  determining  which  predictors 
best  separate  the  groups.  Therefore,  a  discriminant  analysis  package  also  is  used  to 
more  objectively  demonstrate  the  predictive  capabilities  of  an  EOF  representation  of  the 
vorticity  field. 
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II.  DATA  AND  METHODS 


A.  DATA  DESCRIPTION 

The  cases  in  this  study  are  12-hourly  data  for  western  North  Pacific  tropical 
cyclones  during  1979-1984.  These  cases  are  a  combination  of  the  cases  analyzed  by 
Wilson  (1984),  Peak  et  al.  (I9S6)  and  Gunzelman  (1990).  Wilson  and  Peak  et  al.  ex¬ 
tracted  the  Global  Band  Analyses  (GBA)  wind  fields  for  each  case.  Gunzelman  com¬ 
puted  the  relative  vorticity  from  these  wind  fields  and  performed  the  EOF  analyses  of  the 
vorticity  fields.  The  following  restrictions  were  applied  to  the  selection  of  cases: 

•  a  tropical  c>  clone  attaining  at  least  tropical  storm  strength  (maximum  sustained 
winds  of  18  m.  s  (35  kts ;  or  greater); 

•  a  best  track  position  west  of  the  dateline,  east  of  100°  E  and  south  of  34.6°  N;  and 

•  the  meridional  and  zonal  wind  components  of  the  GBA  are  available  at  700,  400 
and  250  mb. 

A  total  of  1573  cases  met  these  requirements  and  were  analyzed. 

1.  Field  description 

The  GBA  wind  fields  are  operationally  generated  every  12  h  by  the  United 
States  Navy  Meet  Numerical  Oceanography  Center  (FNOC).  The  GBA  provide  global 
longitudinal  coverage  between  40.956°  S  and  59.745°  N.  The  analyses  arc  produced  on 
a  Mercator  grid  with  spacing  of  2.5°  latitude  at  22.5°  N  and  S.  Although  zonal  and 
meridional  wind  fields  also  are  available  at  the  surface  and  200  mb,  onlj  the  700,  400 
and  250  mb  levels  are  used.  Analyses  are  based  on  surface  observations,  ship  reports, 
rawinsondes.  pibals,  aircraft  reports  and  satellite-derived  cloud  motion  vectors.  When 
a  tropical  c\ clone  is  present,  eight  bogus  winds  are  inserted  at  the  surface  80  km  (43  n 
mi)  from  the  center  of  the  cyclone,  and  are  coupled  vertically  via  the  thermal  wind 
equation  using  temperature  analyses  at  the  intermediate  levels.  A  detailed  description 
of  the  GBA  is  contained  in  the  L'.S.  Naval  Weather  Service  (1975). 

Wilson  (1984)  and  Peak  et  al.  (1986)  used  a  bi-linear  interpolation  scheme  to 
interpolate  the  zonal  (u)  and  meridional  (v)  GBA  wind  components  onto  a  storm- 

ctri.l  fV\r  m/’U  a  itA  Tlw»  Ai'til  /'nneute  aT  Hnta  AAtnt c  u'tt l>  a  7nnn1  Jitty 

V«-U' V,  NSi  ^1  IS*  I  VI  V">»l  '•M  .’V  •  *  »•>  £>•»*•  »  VI  N  *■  '  ’“**»**  {  «•■•»*’  '•***•  **  **•*»—  *--**«•  ”•* 

meridional  separation  of  277.8  km  (150  n  mi).  It  is  geographically-oriented  with  17 
giidpoints  north-to-south  and  31  gridpoints  cast-to-uest.  The  center  of  the  c> clone. 
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based  on  the  JTWC  warning  position,  is  always  located  at  gridpoint  (16,9).  Gunzelman 
(1990)  computed  relative  vorticity 
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ox  oy 
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using  centered  finite  differences  at  the  internal  gridpoints,  and  one-sided  differences  at 
the  grid  boundaries.  Gunzelman  noted  that  the  mean  vorticity  fields  are  nearly  vertical 
near  the  tropical  cyclone,  and  the  largest  positive  vorticity  values  around  the  cyclone  are 
at  700  mb  and  decrease  with  height.  The  700  mb  vorticity  field  also  has  the  largest  dif¬ 
ference  between  the  positive  values  associated  with  the  cyclone  and  the  negative  values 
associated  with  the  subtropical  ridge,  and  the  gradient  decreases  with  height.  However, 
the  magnitude  of  the  vorticity  associated  with  the  subtropical  ridge  increases  with  height 
and  is  greatest  at  250  mb. 

2.  Empirical  orthogonal  function  analysis 

The  EOF  method  used  by  Gunzelman  (1990)  paralleled  the  procedures  used  by 
Wilson  (19S4)  and  Meanor  (1987),  except  that  it  was  applied  to  relative  vorticity  rather 
than  wind  components  or  the  \ertical  wind  shear.  The  EOF  analysis  was  on  the  527 
point  storm-centered  grid  (Section  Il.A.l)  and  was  based  on  the  same  682  dependent 
cases  during  1979-1983  that  Wilson  used. 

In  this  method,  orthogonal  eigenvectors  and  their  associated  eigenvalues  (coef¬ 
ficients)  arc  calculated  from  the  dependent  set  vorticity  fields  at  each  pressure  level, 
first,  X  (527)  eigenvectors  are  calculated  from  a  normalized  AT  (527  x  682)  matrix  of 
X  (527)  gridpoint  values  for  the  }'  (682)  cases.  The  original  synoptic  gridpoint  Values 
can  be  recovered  by  the  linear  summation  of  the  products  of  the  eigenvectors  and  their 
associated  coefficients.  The  first  eigenvector  (spatial  pattern)  contains  the  largest  vari¬ 
ance.  The  second  eigenvector  contains  the  largest  amount  of  the  variance  not  explained 
by  the  first,  and  so  on.  Once  the  eigenvectors  are  determined  from  the  dependent  data 
set,  the  time  dependence  in  the  synoptic  pattern  for  each  case  is  contained  in  the  EOF 
coefficients. 

One  of  the  advantages  of  the  EOF  representation  is  that  a  relatively  small 
number  of  EOF  eigenvectors  can  be  used  to  represent  a  synoptic  pattern.  To  determine 
the  minimum  number  of  eigenvectors  that  are  needed  to  represent  the  signal  in  the 
vorticity  field,  Gunzelman  (1990)  applied  the  Preisendorfer  and  Barnett  (1977)  Monte 
Carlo  technique  to  distinguish  between  eigenvectors  with  signal  vice  those  with  noise. 
In  this  method,  eigenvalues  for  the  physical  data  arc  compared  to  eigenvalues  lor 
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randomly  generated  data.  If  the  physical  eigenvalue  deviates  significantly  from  the 
eigenvalue  computed  from  a  random  vorticity  field,  there  is  reasonable  assurance  that 
the  associated  eigenvector  is  describing  signal  rather  than  noise.  Based  on  Gunzclman's 
(1990)  results,  the  first  45  vorticity  modes  are  retained  as  potential  descriptors  of  the 
synoptic  fields  associated  with  recurvature  in  this  study.  The  first  45  modes  explain  be¬ 
tween  72.8  and  77.5%  of  the  vorticities  at  the  three  pressure  levels  (Table  1). 


Table  1.  PERCENTAGE  OF  EXPLAINED  VARIANCE  WITH  1  TO  45 
MODES:  Cumulative  percentage  of  variance  (95%  confidence)  with  1  to 
45  EOF  modes  retained  for  the  relative  vorticity  fields  at  three  pressure 
levels  (after  Gunzelman  1990). 


MODE 

700  MB 

400  MB 

250  MB 

1 

7.6 

10.2 

11.4 

2 

13.4 

18.1 

19.1 

3 

16.9 

22.0 

23.6 

4 

20.0 

25.5 

27.6 

5 

22.7 

28.7 

31.0 

6 

25.3 

31.5 

33.9 

7 

27.7 

34.1 

36.6 

8 

30.1 

36.5 

39.2 

9 

32.3 

38.7 

41.4 

10 

34.4 

40.8 

43.4 

« 

« 

« 

* 

40 

69.4 

73.8 

74.7 

41 

70.1 

74.5 

75.3 

42 

70.8 

75.1 

75.9 

43 

71.5 

75.7 

76.4 

44 

72.1 

76.3 

77.0 

45 

72.8 

76.9 

77.5 

Each  eigenvector  consists  of  527  values  that  represent  a  spatial  pattern  on  the 
31x17  analysis  grid.  The  magnitude  of  the  associated  time-dependent  EOF  coefficient 
indicates  the  relative  importance  of  that  pattern  in  each  specific  case.  A  negative  EOF 
coefficient  indicates  that  the  identical  spatial  pattern  applies,  except  that  the  maxima 
and  minima  are  reversed. 

The  first  eigenvector  for  700  mb  vorticity  (Fig.  2)  can  be  interpreted  as  a  tropical 
cyclone  in  the  subtropical  ridge  if  this  pattern  is  multiplied  by  a  negative  coefficient. 
For  example,  the  700  mb  Mode  1  coefficient  for  ST  Vanessa  at  recurvature  time  is  -4.57. 
Therefore,  the  opposite  pattern  with  a  positive  vorticity  value  at  the  storm  center  (dot) 
applies,  and  represents  a  recurving  tropical  cyclone  at  the  axis  of  the  subtropical  ridge. 
Mode  1  eigenvectors  for  400  and  250  mb  relative  vorticity  (not  shown)  represent 
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large-scale  patterns  similar  to  the  Mode  1  pattern  for  700  mb  in  Fig.  2.  As  the  spatial 
patterns  become  increasingly  more  complex  for  liigher  mode  eigenvectors,  the  patterns 
become  increasingly  more  dissimilar  among  the  three  pressure  levels  (see  Gunzelman 
1990  for  Airther  discussion). 


Reconstructed  700  mb  vorticity  fields  for  ST  Vanessa  at  recurvature  time  using 
only  the  first  45  EOF  modes  and  all  527  modes  are  compared  in  Fig.  3.  The  basic  pat¬ 
tern  of  a  tropical  cyclone  at  the  axis  of  the  subtropical  ridge  with  a  strong  vorticity 
gradient  to  the  east,  and  cyclonic  vorticity  associated  with  the  midlatitude  trough  to  the 
north,  is  represented  equally  well  with  45  EOF  modes  as  with  all  527  modes.  The  addi¬ 
tion  of  the  higher  EOF  modes  adds  smaller  scale  features,  which  are  assumed  to  repre¬ 
sent  noise  in  the  vorticity  field. 

B.  SELECTION  OF  CASES 

A  recurvature  forecast  model  learning  set  is  selected  from  the  1573  cases  in  the 
1979-1984  data  set.  As  a  first  step  in  the  selection  process,  the  data  are  categorized  by 
track  type  and  time  to  recurvature.  Initial  identification  of  the  cases  as  recurvers, 
straight-movers  and  odd-movers  is  based  on  the  tropical  cyclone  track  categories 
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Fig.  3.  Reconstructed  700  mb  vorticity  for  ST  Vanessa  at  recurvature.  Relative 
vorticity  contours  (lO^r1)  reconstructed  from  the  first  45  EOF  modes  (top)  and 
from  all  527  modes  (bottom).  Positive  (negative)  values  are  solid  (dashed).  North 
latitude  is  along  the  y*axis  and  east  longitude  is  along  the  x*axis.  The  black  dot  in* 
dicates  the  storm  center  position. 
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assigned  by  Miller  et  al.  (1988).  For  each  recurving  tropical  cyclone,  the  storm  heading 
between  successive  6-h  JTWC  best  track  positions  is  computed  and  the  recurvature  time 
is  identified  as  the  00  or  12  UTC  nearest  the  6«h  interval  in  which  the  storm  heading 
changed  from  west  of  000s  North  to  east  of  000*  North.  This  synoptic  map  time,  for 
which  a  GBA  is  available  to  calculate  the  vorticity  EOF  coefficients,  will  be  referred  to 
as  R-OOh  where  R  indicates  recurvature  and  *00h  indicates  the  number  of  hours  (0)  prior 
to  recurvature  time.  Recurver  cases  within  96  h  of  recurvature  are  then  categorized 
based  on  the  time  to  recurvature  into  the  R-96h  through  R*00h  classification  groups. 
Cases  more  than  96  h  prior  to  recurvature  are  identified  as  pre-recurvers  (PR).  Cases 
after  recurvature  are  excluded  from  the  forecast  model  learning  set.  The  straight-mover 
cases  are  identified  as  non-recurvers  (NR)  if  a  minimum  of  72  h  remains  in  the  track  to 
establish  that  recurvature  does  not  follow  in  that  time.  This  requirement  excludes  from 
the  learning  set  all  straight-mover  cases  that  cannot  be  verified  as  non-recurvature  situ¬ 
ations  throughout  a  72-h  forecast  period.  Odd-mover  cases  (382  cases  from  33  tropical 
cyclones)  are  not  included  in  the  model  learning  set,  but  will  be  used  to  test  the  ability 
of  the  final  EOF  recurvature  forecast  model  to  classify  these  cases  into  the  straight- 
mover  or  recurver  group  that  most  closely  describes  the  storm  motion. 

After  screening,  a  total  of  782  cases  from  97  storms  are  retained  in  the  model 
learning  set  (Table  2).  Although  the  learning  set  cases  in  the  Euclidean  distance  ap¬ 
proach  and  the  discriminant  analysis  approach  differ,  the  entire  learning  set  will  be  used 
to  compare  the  overall  prediction  skill  of  the  approaches. 

C.  CRITERIA  FOR  EVALUATING  MODEL  PERFORMANCE 

Evaluation  criteria  are  chosen  to  test  the  forecast  model's  ability  to  meet  the  two 
classification  goals:  identification  of  track  type  and  identification  of  the  time  to  recur¬ 
vature.  Since  no  objective  guidance  is  available  (or  official  forecast  is  issued)  as  to 
whether  a  storm  will  be  a  recurver  or  a  straight-mover,  the  only  absolute  measure  of 
usefulness  is  a  comparison  with  a  climatological  forecast  of  recurvature. 

1.  Percent  correct 

The  percent  of  cases  correctly  forecast  as  recurver  (%R)  or  straight  (%S)  and 
the  total  correctly  forecast  in  both  track  type  categories  (%T)  tests  the  model's  ability 
to  identify  the  overall  track  type.  The  percent  correct  is  calculated  for  recurver  and 
straight-track  types  defined  by  the  tunes  in  Table  2.  That  is,  a  classification  into  any 
of  the  R-72h  through  R-OOh  groups  is  considered  to  be  a  correct  forecast  of  a  recurver. 
Similarly,  classification  into  the  NR,  PR  and  R-96h  through  R-84h  groups  represent 
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Table  2.  RECURVATURE  MODEL  LEARNING  SET  CASES  BY  FORECAST 
CATEGORY:  Number  of  1979-1984  tropical  cyclones  that  are  categor¬ 
ized  as  recurver  or  straight  track  types.  The  recurver  learning  set  is  defined 
as  those  times  within  72  h  of  recurvature  time  (R-OOh).  The  straight 
learning  set  includes  all  times  preceding  72  h  of  recurvature  time,  plus  se¬ 
lected  times  from  the  straight-track  storms.  The  number  of  cases  retained 
in  the  model  learning  set  is  listed  for  each  track-type  category  and  for  each 
12-h  forecast  category. 


NUMBER  OP 

12-H  FORECAST 

NUttER 

TRACK  TYPf 

TROPICAL  CYCLONES 

CATEGORIES 

OF  CASES 

RECURVER 

60 

R-OOH 

ES 

R-12H 

66 

R-24H 

SB 

R-I6H 

62 

R-4SM 

46 

R-60M 

41 

R-72H 

32 

TOTAL 

337 

STRAIGHT 

37 

R-64H 

SO 

R-96H 

24 

PR 

113 

NR 

27S 

TOTAL 

446 

TOTAL 

97 

7S2 

correct  forecasts  of  a  straight-track  situation,  because  the  tropical  cyclone  did  not  re¬ 
cline  during  the  72-h  forecast  period.  The  simple  percent  correct  measure  is  also  used 
in  evaluating  the  time-to-recurvature  prediction  performance  of  the  model.  In  that  case, 
only  a  classification  into  the  appropriate  time-to-recurvature  group  will  be  credited  as  a 
correct  forecast. 

2.  Classification  matrix  scores 

Classification  matrix  scores  assign  penalty  points  to  misclassifications  as  a  linear 
function  of  the  number  of  12-h  categories  between  the  prediction  and  the  verification 
groups.  That  is,  one  additional  penalty  point  is  assigned  for  each  12-h  group  between 
the  model  forecast  and  the  Verification.  Since  a  misciassification  of  a  recurvature  case 
into  the  PRNR  forecast  group  represents  a  larger  error,  two  additional  penalty  points 
are  assigned  in  the  PRNR  category  relative  to  the  R*96h  category.  Because  this  is  a 
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penalty  score,  higher  skill  is  represented  by  numbers  close  to  zero.  A  penalty  score  of 
1.0  would  indicate  that  the  average  misclassiflcation  is  off  by  one  category. 

Three  classification  matrix  scores  are  defined  based  on  the  matrix  of  penalty 
points  in  Table  3:  D-score  (dependent);  1-score  (independent);  and  R-score  (recurver). 
Given  a  classification  matrix  that  contains  the  number  of  cases  that  are  forecast  in  each 
classification  group  (columns)  and  verify  in  each  verification  category  (rows),  the  penalty 
points  in  Table  3  are  assessed  by  multiplying  the  number  of  cases  by  the  penalty  points 
for  that  error.  No  penalty  points  are  given  to  the  correct  classifications  along  the 
diagonal. 


Table  3.  MATRIX  OF  PENALTY  POINTS  FOR  CLASSIFICATION  MATRIX 
SCORES:  Penalty  points  are  assessed  for  erroneous  forecasts  of  time- 
to-recurvature  in  12*h  increments  or  as  PRNR.  These  penalty  points  are 
summed  over  three  subsets  to  calculate  the  classification  matrix  D-,  1-  and 
R-scores.  The  matrix  columns  (forecast  model  classification  groups)  and 
rows  (case  verification  categories)  are  the  same  as  those  in  the  model 
classification  matrix. 


CLASSIFICATION 

VERIFY 

00 

12 

24 

34 

40 

40 

72 

04 

M  PRNR  | 

R-OOH 

0 

1 

2 

3 

4 

S 

4 

7 

0 

10 

R-12H 

1 

0 

1 

2 

3 

4 

6 

4 

7 

♦ 

R-I4H 

2 

1 

0 

1 

2 

3 

4 

S 

4 

8 

R-34H 

s 

2 

1 

0 

1 

2 

3 

4 

f 

7 

R-48II 

4 

3 

2 

1 

0 

1 

2 

3 

4 

4 

R-iOH 

5 

4 

S 

2 

1 

0 

1 

2 

3 

B 

R-72H 

4 

5 

4 

3 

2 

1 

0 

1 

2 

4 

R-MH 

7 

4 

S 

4 

3 

2 

1 

0 

1 

3 

R-MH 

• 

7 

4 

5 

4 

3 

2 

1 

0 

2 

PR 

10 

9 

a 

7 

4 

S 

4 

3 

2 

0 

NR 

10 

9 

• 

7 

4 

S 

4 

3 

2 

0 

The  three  classification  matrix  scores  are  obtained  by  multiplying  the  classifica¬ 
tion  matrix  of  model  results  by  the  penalty  point  matrix  and  calculating  three  sums  of 
the  products.  These  sums  are  then  normalized  by  the  number  of  cases  in  the  sample  so 
that  the  scores  can  be  compared  for  different  sample  sizes.  The  three  classification  ma¬ 
trix  scores  examine  various  aspects  of  the  forecast  model  skill  by  scoring  only  cases  that 
belong  to  certain  verification  categories.  The  classification  matrix  I-score  includes  cases 
in  all  of  the  verification  categories  (R-00h  through  R-96h  plus  PR  and  NR)  that  are  in 
the  independent  sample.  The  D-score  is  designed  to  compare  results  from  dependent 
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and  independent  ."mples,  which  contain  different  sets  of  cases.  Since  the  PR  cases  are 
not  always  induced  in  the  dependent  set  to  define  a  PRNR  classification  group,  the  PR 
case  forecasts  are  excluded  in  the  classification  matrix  D-score.  Consequently,  the  D- 
score  and  I-score  will  have  similar  magnitudes,  with  an  offset  that  is  proportional  to  the 
performance  of  the  forecast  model  on  the  PR  cases.  The  D-  and  1-score  will  provide  an 
exact  comparison  of  forecast  skill  only  if  the  ratio  of  the  combined  number  of  PR  and 
NR  cases  to  the  combined  number  of  R-OOh  through  R-96h  cases  is  the  same  for  both 
data  sets  (e.g.,  in  the  learning  set).  Since  the  PR  and  NR  cases  are  assigned  more  pen- 
•tlty  points  for  misclassifications  than  the  R-OOh  through  R-96  cases,  the  relative  number 
of  cases  in  each  group  will  affect  the  matrix  scores  that  score  PR  and  NR  forecasts.  The 
R-score  is  an  indication  of  the  model's  ability  to  correctly  identify  the  time  to  recurva¬ 
ture  in  recurver  cases.  That  is,  the  penalty  scores  in  Table  3  are  only  summed  over  the 
R-OOh  through  R-72h  verification  categories. 

3.  Climatological  forecasts  and  scores 

A  climatological  forecast  is  obtained  by  counting  the  number  (N  in  Table  4)  of 
JTWC  best  track  00  and  12  UTC  positions  for  1979-1984  cyclones  of  tropical  storm 
strength  or  greater  in  each  classification  group  (R-OOh  through  R-96h  plus  PRNR). 
Thus,  the  climatology  data  set  contains  all  the  learning  set  cases,  plus  additional  cases 
that  were  excluded  from  the  learning  set  because  either  the  best  track  position  did  not 
meet  the  requirements  in  Section  1I.A  or  the  GBA  wind  fields  w'ere  not  available  at  all 
three  pressure  levels.  The  percentage  of  recurving  (41.7),  straight-moving  (36.4)  and 
odd-moving  storms  (21.9)  for  these  six  years  is  representative  of  the  percentages  (42.5, 
36.4  and  21.1,  respectively)  for  the  28-year  period  1945  to  1987  (Miller  et  al.  1988). 

To  obtain  the  climatological  forecast  classification  matrix  (Table  4),  a  fraction 
of  the  learning  set  cases  in  each  12-h  verification  category  are  forecast  into  each  of  the 
ten  classification  groups  based  on  the  percent  of  climatological  cases  in  each  of  the  ten 
groups  (percent  in  Table  4).  By  ignoring  the  straight-mover  cases  with  less  than  72  h 
remaining  and  the  odd-mover  cases,  these  climatological  forecasts  can  be  compared  to 
the  model  forecasts  predicated  on  similarly  screened  data.  The  skill  scores  for  the 
climatological  forecasts  of  the  learning  set  cases  are  given  in  Table  5.  Any  forecast 
model  should  have  higher  percent  correct  and  lower  D*score,  1-score  and  R-score  to  be 
considered  as  useful  to  the  forecaster. 
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Table  4.  CLIMATOLOGICAL  FORECASTS  FOR  THE  LEARNING  SET:  The 
learning  set  cases  belonging  to  each  verification  category  are  classified  into 
the  ten  classification  groups  with  the  relative  frequency  (column  labeled 
percent)  that  the  cases  in  the  1979*1984  climatology  aata  set  belong  to 
each  of  the  ten  classification  groups.  Since  the  number  of  classifications 
is  rounded  to  the  nearest  whole  integer,  the  total  is  780  vice  782. 


VERIFY 

CLIMATOLOGY 

LEARNING  SET  CLASSIFICATIONS 

N 

PERCENT 

00 

12 

24 

36 

48 

60 

72 

84 
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41 
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3 
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3 

2 

2 

2 

1  32 

R-12H 

40 

t  4.21) 

4 

3 

3 

% 

3 

2 

2 

2 

1  32 

R-24H 

68 

(  4.00) 

3 

3 

3 

3 

3 

2 

2 

2 

1  32 

R-36H 

ss 

I  6.49) 

3 

3 

3 

3 

2 

2 

2 

2 

1  SO 

R-48H 

44 

(  4.76) 

3 

3 

3 

3 

& 

2 

2 

1 

1  27 

R-40H 

42 

1  4.35) 

3 

3 

2 

2 

2 

2 

1 

1 

1  24 

R-72H 

SS 

1  S.42) 

2 

2 

2 

2 

2 

1 

1 

1 

1  19 

STRAIGHT i 

R-84H 

SO 

(  3.11) 

2 

2 

2 

2 

1 

1 

1 

1 

1  17 

R-96H 

24 

(  2.48) 

2 

1 

1 

1 

1 

1 

1 

1 

1  14 

PR 

114 

tPRNR* 

7 

7 

7 

6 

6 

5 

4 

4 

3  45 

NR 

445 

57.87) 

18 

17 

17 

15 

13 

12 

9 

9 

7  161 

Table  5.  FORECAST  SKILL  FOR  CLIMATOLOGICAL  FORECASTS:  Percent 
of  recurver  (%R),  straight  (%S)  and  total  (%T)  cases  correctly  classified 
according  to  track  type.  D-,  1*  and  R-score  are  classification  matrix  scores 
that  indicate  skill  in  correctly  classifying  cases  with  12-h  accuracy.  Scores 
are  computed  from  the  actual  number  of  learning  set  cases  that 
climatologically  occur  in  each  group  vice  the  integer  values  presented  in 
Table  4. 


%R 

36.5  "1 

%S 

63.5 

%T 

53.6 

D-score 

4.11 

I  *score 

3.93 

R-score 

5.30 
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III.  EUCLIDEAN  DISTANCE  METHOD 


A.  BACKGROUND 

The  Euclidean  distance  approach  in  this  section  examines  both  the  physical  changes 
in  the  vorticity  patterns  that  precede  tropical  cyclone  recurvature  and  the  ability  to  dis* 
tinguish  among  these  patterns  using  an  EOF  representation.  Since  the  time-dependent 
EOF  coefficients  represent  the  synoptic  fields  that  exist  at  each  time,  the  coefficients 
should  vary  in  a  systematic  manner  as  the  tropical  cyclone  moves  around  the  subtropical 
ridge  during  recurvature.  Simple  two-dimensional  plots  of  the  first  and  second  EOF 
coefficients  on  the  x-  and  y-  axes  in  Fig.  4  indicate  that  these  coefficients  for  the  1984 
recurvers  have  similar  traces.  The  Mode  1  coefficients  are  initially  positive,  which  indi¬ 
cates  a  large-scale  positive  vorticity  pattern  centered  along  the  latitude  of  the  storm 
center  in  the  first  eigenvector  (Fig.  2)  and  represents  the  synoptic  pattern  while  these 
storms  are  still  located  in  the  monsoon  trough.  As  these  storms  move  northward  out 
of  the  monsoon  trough  and  recurve,  the  magnitude  of  the  Mode  1  coefficients  decreases 
and  then  becomes  negative  to  represent  the  negative  vorticity  associated  with  the  sub¬ 
tropical  ridge.  At  the  time  of  recurvature,  the  first  and  second  EOF  coefficients  for  the 
19S4  recurvers  tend  to  cluster  in  the  same  region  on  the  two-dimensional  plot.  In  con¬ 
trast,  the  1984  straight-moving  cyclones  have  EOF  coefficients  that  cluster  in  a  separate 
region,  and  the  odd-moving  cyclones  have  coefficients  that  exhibit  characteristics  of  both 
the  recurvers  and  straight-movers  (Fig.  5).  This  leads  to  the  hypothesis  that  an  individ¬ 
ual  cyclone  may  be  distinguished  as  a  recurver  (straight-mover)  if  the  EOF  coefficients 
for  that  cyclone  are  closer  to  the  mean  of  the  cluster  associated  with  recurvers 
(straight-movers).  The  questions  are  how  far  in  advance  of  recurvature  can  these  dif¬ 
ferences  in  EOF  coefficients  be  detected  and  with  what  time  accuracy. 

B.  MODEL  DEVELOPMENT 

To  test  the  hypothesis  that  individual  cases  may  be  classified  according  to  the 
closeness  of  their  EOF  coefficients  to  the  mean  values  for  the  recurver  and  straight  sets, 
a  classification  model  is  developed  using  the  Euclidean  distance  method.  The  Euclidean 
distance  ( D )  is  calculated  in  multidimensional  EOF  space  using  the  formula 

D  -  V(«(«)  -  a(a)f  + ...  +  (<*(/)  -  5(/))2 ,  (3.1) 
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Fig.  4.  Time  progression  of  the  first  and  second  EOF  coeiTicients  for  1984 
recurvers.  Markers  indicate  the  values  of  the  first  (x*axis)  and  second  (y*axis)  EOF 
coefficients  of  the  700  mb  vorticity  fields  for  all  12*hourly  cases  analyzed  by 
Gunzclman  (1990).  Values  at  rccurvature  time  are  circled  and  arrow  heads  mark  the 
last  case  in  each  storm  sequence  (sec  legend  for  storm  number).  The  start  and  end 
of  the  sequence  for  ST  Vanessa  (storm  number  25  during  1984  is  denoted  2584)  are 
labeled. 


where  a  is  the  EOF  coefficient  for  the  case,  5  is  the  mean  of  the  EOF  coefficients  for  the 
forecast  classification  group,  and  the  indices  a  through  i  represent  the  EOF  modes  used 
as  predictors.  Separate  distances  are  calculated  relative  to  the  mean  EOF  value  of  each 
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Fig.  5.  Time  progression  of  the  first  and  second  EOF  coefficients  for  1984 
straight-movers  and  odd-movers.  As  in  Fig.  4,  except  for  1984  straight-moving 
storms  (left)  and  odd-moving  storms  (right). 


potential  classification  group,  and  the*',  the  classification  is  into  the  group  with  the 
smallest  distance. 

I.  Forecast  group  means 

Two  issues  in  the  development  of  this  simple  model  are  the  selection  of  the 
representative  raurver  and  straight-mover  cases  to  calculate  the  classification  group 
means,  and  the  specification  of  the  set  of  EOF  modes  that  best  distinguishes  between  the 
recurver  and  straight-mover  situations.  A  "clean"  set  of  15  recurving  and  15  straight- 
moving  storms  is  selected  from  the  1979-1984  data  set  in  hopes  of  identifying  the  most 
representative  vorticity  patterns  for  the  classification  categories.  The  following  criteria 
arc  used  to  select  the  clean  sets: 

•  a  tropical  cyclone  attaining  a  least  typhoon  strength  (maximum  sustained  winds 
of  33  //is-1  (65  kts)  or  greater); 

•  formation  east  of  130°  E;  and 

•  a  typical  recurver  or  straight  track  exhibiting  no  significant  deviations. 
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Tracks  for  the  clean  set  storms  are  shown  in  Fig.  6.  Because  the  clean  storms 
exhibit  typical  recurver*  or  straight-track  motion,  the  EOF  coefficients  for  these  storms 
should  be  representative  of  the  typical  vorticity  patterns  associated  with  recurver*  or 
straight-track  motion. 

The  mean  values  of  the  first  45  time-dependent  EOF  coefficients  are  computed 
from  the  clean  recurver  set  at  times  R-00h  through  R-96h  and  from  the  clean  straight- 
mover  set  that  is  labeled  NR.  As  in  the  time  progressions  of  the  first  two  EOF  coeffi¬ 
cients  for  the  1984  storms  in  Figs.  4  and  5,  considerable  variability  exists  around  the 
12-hourly  mean  coefficients  even  in  these  clean  set  storms  (not  shown).  To  obtain  more 
representative  transitions  among  the  time-to-recurvature  groups  in  EOF  space,  a  run¬ 
ning  mean  value  is  calculated  from  three  times  centered  on  the  desired  time.  For  ex¬ 
ample,  the  mean  for  recurvature  time  R  is  calculated  from  the  EOF  coefficients  at 
R-12h,  R-OOh  and  R+12h.  The  NR  group  averages  also  are  calculated  from  three 
consecutive  12-hourly  cases.  These  cases  are  selected  so  that  the  average  longitude  of 
the  clean  set  straight-mover  cases  (132.U60  E)  is  close  to  the  average  longitude  of  the 
clean  set  recurvers  at  recurvature  time  (130.99°  E).  Although  only  straight-mover  data 
are  used  to  define  the  PRNR  classification  group,  the  Euclidean  distance  approach 
should  distinguish  straight-moving  cases  (NR)  as  well  as  recurving  storm  cases  that  are 
more  than  96  h  before  recurvature  (PR). 

Vorticity  fields  at  each  pressure  level  (700,  400  and  250  mb)  reconstructed  from 
the  mean  EOF  coefficients  for  each  classification  group  (Figs.  7,  8  and  9)  illustrate  the 
evolution  of  the  synoptic  patterns  associated  with  recurvature.  These  patterns  are  sim¬ 
ilar  at  all  three  levels.  The  sequence  starts  with  the  NR  pattern  in  which  the  subtropical 
ridge  is  well  defined  by  the  broad  anticyclonic  (negative)  vorticity  center  to  the  north  of 
the  cyclone  center.  Such  a  pattern  would  be  expected  to  produce  westerly  or 
northwesterly  storm  motion  and  a  straight-type  track.  At  R-96h,  the  anticyclonic 
vorticity  associated  with  the  subtropical  ridge  is  weaker  to  the  north  and  stronger  to  the 
northeast  of  the  storm  center  than  it  was  in  the  NR  pattern.  Proceeding  toward  recur¬ 
vature  time,  the  cyclonic  (positive)  vorticity  associated  with  the  storm  and  the 
anticyclonic  vorticity  associated  with  the  subtropical  ridge  increase  in  magnitude  as  the 
composite  "clean-set  storm"  moves  north-northwest  around  the  ridge.  At  recurvature 
time,  the  storm  center  position  is  at  the  axis  of  the  ridge  and  only  a  relatively  weak  re¬ 
gion  of  anticyclonic  vorticity  is  found  between  the  storm  and  the  midlatitude  cyclonic 
vorticity  to  the  north.  The  differences  among  the  rccurvature  patterns  at  the  three 
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Fig.  6.  Clean  sets  of  recurvers  and  straight-movers  for  the  Euclidean  distance 
approach.  JTWC  best  tracks  for  the  15  recurving  (top)  aod  15  straight-moving 
(bottom)  clean  set  storms  during  1979-1984.  These  storms  are  used  to  calculate  the 
mean  time-dependent  EOF  coefficients  that  identify  the  recurvers  and  the  straight- 
movers  for  the  Euclidean  approach. 
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Fig.  7.  Reconstructed  700  mb  vorticity  fields  for  clean  set  composites.  Relative 
vorticity  contours  (10'5s_1)  are  reconstructed  from  the  means  of  the  first  45  EOF 
inodes  for  the  clean  set  storms  at  R-00h  (top  loll),  R-24h  (top  right),  R*48h  (middle 
left),  R-72h  (middle  right),  R-96h  (bottom  left),  and  NR  (bottom  right).  Positive 
(negative)  values  are  solid  (dashed).  North  latitude  is  along  the  y-axis  and  east 
longitude  is  along  the  x-axis.  The  black  dot  indicates  the  storm  center  position. 


pressure  levels  are  similar  to  the  relative  vorticity  differences  with  height  noted  by 
Gunzelman  (1990),  as  described  in  Section  U.A.l. 
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Fig.  8.  Reconstructed  400  mb  vorticity  fields  for  clean  set  composites.  Time  in¬ 
tervals  and  contours  are  similar  to  Fig.  7. 


2.  Predictor  inodes 

The  objective  is  to  select  the  set  of  EOF  predictors  that  best  separates  the 
timc-to-recurvature  and  PRNR  classification  groups,  as  defined  by  the  clean  set  mean 
values  in  multidimensional  space.  Since  the  Euclidean  distance  approach  offers  no 
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Fig.  9.  Reconstructed  250  mb  vorticity  fields  for  clean  set  composites.  Time  in¬ 
tervals  and  contours  are  similar  to  Fig.  7. 


objective  selection  criteria,  such  as  the  F-to-cnter  and  other  statistics  in  regression  and 
discriminant  analysis  packages,  the  final  choice  of  predictors  will  be  based  on  model 
classification  skill.  Although  the  initial  tests  are  conducted  for  all  three  pressure  levels, 
the  Euclidean  model  development  is  presented  here  for  700  mb  data  only. 
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Potential  predictors  are  first  screened  by  the  ability  to  distinguish  the  clean  set 
recurving  cases  from  the  straight-moving  cases  (NR)  at  each  12-h  time  (R*O0h  through 
R-96h).  In  this  procedure,  the  clean  set  cases  in  one  time-to-recurvature  group  (plus  or 
minus  12  h)  and  all  straight-moving  storm  cases  are  classified  as  recurvers  (straight- 
movers)  if  the  Euclidean  distance  is  closer  to  the  clean  set  mean  values  for  the 
time-to-recurvature  group  (PRNR  group).  The  skill  in  identifying  the  storm  type  is 
expressed  as  the  percent  correctly  classified. 

To  illustrate  the  importance  of  the  choice  of  the  predictor  set  modes,  clean  set 
classifications  into  recurvers  versus  straight-movers  using  200  randomly  selected  sets  of 
ten  EOF  modes  are  compared  in  Fig.  10.  One  hundred  sets  of  ten  modes  are  selected 
randomly  from  the  first  45  EOF  modes  (top)  and  100  sets  are  formed  from  EOF  Mode 
1  plus  nine  other  randomly  selected  modes  (bottom).  The  combined  skill  in  distin¬ 
guishing  between  recurving  and  straight-moving  storms  is  better  than  50%  for  all  times 
before  recurvature  for  all  random  sets.  The  highest  skill  is  achieved  when  EOF  Mode  1 
is  forced  and  ranges  from  95%  at  R-OOh  to  about  80%  at  R-48h  to  R*96h.  However, 
the  skill  among  the  random  sets  varies  by  as  much  as  40  percentage  points.  In  the  tests 
with  ten  randomly  selected  predictors  (top,  Fig.  .10),  notably  better  classification  skill  in 
the  R-OOh  through  R-36h  groups  also  is  achieved  when  Mode  1  is  included.  These  re¬ 
sults  illustrate  the  importance  of  Mode  1  in  distinguishing  recurving  storm  vorticity 
fields  near  recurvature  time.  However,  the  remaining  EOF  predictors  are  necessary  to 
discern  the  R-4Sh  through  R*96h  recurving  storm  vorticity  fields  from  the  straight- 
mover  fields.  The  problem  is  how  to  determine  the  optimum  set  of  predictors  without 
having  to  evaluate  all  possible  permutations  of  the  first  45  EOF  modes. 

Since  the  optimum  set  of  predictors  must  be  able  to  distinguish  recurver  and 
straight  vorticity  fields  at  all  12-h  time  steps  before  recurvature,  the  set  should  consist 
of  some  combination  of  the  modes  that  best  distinguish  at  each  of  the  individual  times 
before  recurvature.  Thus,  recurver  versus  straight-mover  classifications  are  evaluated  for 
each  of  the  45  EOF  modes  separately  for  each  12-h  time  group.  For  each  time-to- 
recurvature  group,  the  first  45  modes  are  ranked  as  potential  predictors  in  the  order  of 
their  individual  skill.  Then  a  prototype  set  of  predictors  is  formed  from  the  two  predic¬ 
tors  with  the  highest  individual  skill.  If  the  skill  for  this  set  is  greater  than  when  only 
the  highest  individual  predictor  is  included,  the  second  predictor  is  retained  in  the  set. 
This  stepwise  process  is  continued  by  including  the  individual  predictor  with  the  next 
highest  skill  until  the  45th  best  EOF  mode  is  evaluated.  In  each  step,  the  new  predictor 
is  only  retained  if  the  percent  correct  classifications  is  increased  over  the  previous  step. 
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Fig.  10.  Euclidean  method  classification  skill  into  recurvers  and  straight-movers 
using  randomly  selected  EOF  predictors.  Classification  skill  for  the  clean  set  cases 
using  100  sets  of  ten  randomly  selected  EOF  modes  (top)  and  using  100  sets  of  EOF 
Mode  1  plus  nine  other  randomly  selected  modes.  Clean  set  cases  in  each  time-to* 
recurvature  group  (R-00h  through  R-96h)  (abscissa)  are  distinguished  from  the  ciean 
set  straight-mover  cases  (NR).  The  percent  correct  classifications  (ordinate)  is  for 
both  the  recurving  and  straight-moving  storm  cases. 
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Surprisingly,  only  the  EOF  Mode  1  is  included  in  the  700  mb  set  for  the  R*00h 
group  using  this  stepwise  screening  process.  That  is,  no  other  mode  increases  the  clas- 
sification  skill  relative  to  EOF  Mode  1  alone.  However,  the  skill  of  this  Mode  1 
Euclidean  model  in  distinguishing  clean  set  recurver  and  straight-mover  cases  (top,  Fig. 
1 1)  rapidly  declines  from  95%  for  R-OOh  to  75%  for  R-36h  and  only  50%  at  R-96h. 
This  result  is  consistent  with  Fig.  10,  and  indicates  that  Mode  1  alone  is  not  adequate 
for  distinguishing  recurvers  versus  straight-movers  at  other  times  prior  to  recurvature. 
When  the  stepwise  addition  of  predictors  is  applied  at  each  of  these  times,  multiple 
modes  are  selected.  The  skill  of  these  sets  of  predictors  to  distinguish  recurver  and 
straight  storm  cases  (bottom,  Fig.  11)  ranges  from  80-95%.  Consequently,  this  result 
indicates  the  optimum  performance  of  an  Euclidean  model  with  the  dependent  set  of 
clean  storms.  In  practice,  the  time  to  recurvature  is  unknown  and  the  forecaster  would 
not  know  which  of  these  sets  for  individual  times  would  apply.  The  objective  is  then  to 
select  a  set  of  predictors  that  can  be  applied  at  all  times,  but  does  not  degrade  too  se¬ 
verely  from  the  optimum  performance  at  the  individual  times  shown  in  Fig.  11. 

Potential  overall  best  sets  are  formed  from  the  EOF  modes  included  in  the  sep¬ 
arate  sets  determined  for  each  time  step  in  Fig.  1 1  plus  other  time-step  sets.  Two  addi¬ 
tional  time-step  sets  arc  formed  using  the  less  restrictive  selection  criteria  that  inclusion 
of  a  specific  EOF  mode  does  not  change  (degrade)  skill.  In  another  time-step  set  se¬ 
lection  approach,  the  EOF  modes  simply  are  entered  in  numerical  order,  rather  than 
according  to  their  relative  skill  in  discerning  storm  type.  Using  the  lower  mode  EOF 
coefficients,  which  are  less  likely  to  contain  noise  than  the  higher  modes  and  are  related 
to  larger  scale  features  in  the  vorticity  fields,  may  provide  more  reliable  separation 
among  the  classification  groups.  A  summary  of  these  five  selection  criteria  for  the 
time-step  sets  is  given  in  Table  6.  Since  each  of  these  selection  criteria  leads  to  the 
inclusion  of  different  EOF  modes  in  the  Euclidean  method  for  the  time-step  groups,  no 
consensus  is  evident  for  use  in  forming  the  overall  best  sets. 

Various  subjective  criteria  involving  the  number  of  times  an  EOF  mode  is  se¬ 
lected  for  one  of  the  individual  time-step  sets  are  tested  to  form  an  overall  best  set.  For 
the  collection  of  R-OOh  through  R-96h  predictor  sets  selected  using  one  of  the  criteria 
A  through  E  in  Table  6,  a  mode  may  be  required  to  appear  in  a  certain  number  of  these 
individual  sets  to  be  included  in  a  potential  overall  best  set.  Each  potential  overall  best 
set  of  predictors  is  evaluated  by  scoring  the  Euclidean  distance  classifications  into  the 
12-h  time-to-recurvature  groups  (R-OOh  through  R-96h)  plus  PRNR.  Classification 
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Fig.  11.  Euclidean  method  classification  skill  using  time-step  sets  of  EOF  predic¬ 
tors.  Classification  skill  as  in  Fig.  10,  except  for  only  the  EOF  Mode  1  (top)  and 
for  separate  sets  of  EOF  predictors  at  each  time-to-recurvature  group  (bottom). 
Recurvers  (dotted),  straight-movers  (dashed)  and  the  total  correctly  classified  in 
both  storm  categories  (solid)  are  indicated. 


matrix  scores  for  the  dependent  clean  set  (161  cases)  and  for  the  learning  set  cases  not 
belonging  to  any  of  the  clean  set  storms  (458)  are  presented  in  Table  7. 


Table  6.  EOF  MODE  SELECTION  CRITERIA  FOR  EUCLIDEAN  METHOD 
TIME-STEP  PREDICTOR  SETS:  Sets  are  formed  from  the  stepwise 
selection  of  the  first  45  EOF  modes  in  the  order  (column  2)  of  their  indi¬ 
vidual  skill  in  distinguishing  between  clean  set  recurvers  and  straight- 
movers  (predictability)  and  or  simply  in  numerical  order.  A  mode  is 
selected  (column  3)  if  the  new  set  skill  is  greater  than  (GT),  or  greater  than 
or  equal  to  (GE)  the  skill  before  the  addition  of  that  mode.  This  stepwise 
process  is  continued  until  all  45  modes  are  tested.  Then,  the  total  number 
of  predictors  retained  in  the  set  is  limited  to  the  number  specified  in  col¬ 
umn  4.  "NONE"  indicates  that  no  restriction  is  placed  on  the  total  num¬ 
ber  of  predictors  that  may  be  retained  in  the  set.  "10  (MIN)"  indicates 
that  only  the  minimum  number  of  modes  required  to  achieve  the  same  skill 
as  the  first  ten  modes  selected  in  the  stepwise  selection  process  are  ulti¬ 
mately  retained  in  the  time-step  set. 
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SELECTION 
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PREDICTABILITY 
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E 

NUMERICAL 

BE 
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The  stability  of  the  Euclidean  model  is  judged  first  by  comparing  the  D-score 
for  the  independent  and  dependent  samples.  This  D-score  evaluates  only  the  categories 
R-00h  through  R-96h  plus  NR  that  comprise  the  dependent  sample.  As  expected,  skill 
is  best  (D-score  -  1.78-2.07)  for  the  dependent  set  classifications.  The  degradation  in 
the  D-scores  for  the  independent  sample,  which  range  from  2.56  to  2.73,  is  not  linear. 
For  example,  the  second-best  score  for  the  independent  sample  (2.57)  is  for  a  model  that 
has  the  worst  D-scorc  (2.07)  for  the  dependent  sample.  In  addition,  the  model  with  the 
best  D-score  (1.78)  in  the  dependent  sample  has  one  of  the  worst  D-scores  with  the  in¬ 
dependent  sample.  Notice  that  higher  skill  is  attained  when  the  selection  of  the  EOF 
predictors  is  according  to  their  relative  predictability  (lines  1-6)  than  if  selection  is  simply 
in  numerical  order  (lines  7-9).  However,  the  selection  of  such  a  large  number  of  EOF 
predictors,  and  especially  the  selection  of  such  high  order  modes  as  41  and  42,  is  a 
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Table  7.  CLASSIFICATION  SKILL  FOR  TWO  METHODS  OF  SELECTING 
EUCLIDEAN  MODEL  PREDICTORS:  700  mb  classification  skill  in 
terms  of  D-  and  I-scores  for  the  independent  set  forecasts  (first  two  col¬ 
umns)  and  D-scores  for  the  dependent  (clean  set)  forecasts.  Predictor 
HOF  modes  in  lines  1-9  are  from  various  subjective  combinations  of  the 
sets  of  predictors  that  separate  the  individual  time-to-recurvature  groups 
(R-OOh  through  R-96h)  from  tiie  PRNR  group.  Selection  criteria  for  these 
time-step  sets  (column  4)  are  explained  in  Table  6,  and  the  number  in  pa¬ 
rentheses  indicates  the  number  of  R-OOh  through  R-96h  sets  in  which  an 
individual  mode  must  have  appeared  to  be  retained  in  the  potential  overall 
best  set.  R-00h  (lines  10  and  11)  time-step  sets  are  also  evaluated  as 
Euclidean  model  predictors. 
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concern.  It  may  be  that  the  dependent  set  is  being  well  described,  but  this  is  at  the 
expense  of  degradation  in  the  independent  sample  performance. 

Another  subjective,  but  physically  based,  approach  can  be  used  in  the  selection 
of  predictors  for  the  Euclidean  method.  Recall  that  the  time-to-recurvature  coefficients 
in  Fig.  4  trace  out  a  smooth  path  in  the  EOF  1  -  EOF  2  domain.  Since  these  coefficients 
are  in  time  order,  increasing  the  geometric  distance  between  the  beginning  and  end  time 
coefficients  should  also  increase  the  distances  between  the  intermediate  time  values. 
Thus,  the  hypothesis  is  that  the  set  of  predictors  that  best  distinguishes  between  the 
R-OOh  and  NR  clean  set  cases  may  also  be  best  for  identifying  the  intermediate  12-h 
time-to-recurvature  cases.  To  test  this  hypothesis,  the  R-OOh  sets  formed  using  the 
stepwise  selection  criterion  A  and  C  in  Table  6  are  evaluated  as  overall  Euclidean  model 
predictor  sets  (bottom  group,  Table  7).  Both  sets  are  selected  from  the  first  45  EOF 
modes  in  order  of  their  individual  predictability.  As  indicated  above,  only  EOF  Mode 
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1  enters  the  700  mb  model  if  selection  criteria  A  is  applied.  If  the  mode  retention  crite* 
rion  is  relaxed  to  just  greater  or  equal  (selection  criteria  C),  seven  additional  modes  are 
included  in  the  set.  These  additional  modes  significantly  improve  dependent  sample 
classification  skill  (D-score  *  1.87  versus  2.27  obtained  using  Mode  1  alone).  Rated 
on  the  D-score  performance,  independent  sample  classification  skill  is  also  higher  (2.48 
versus  2.55  for  Mode  1  alone).  Surprisingly,  the  I-score  for  the  independent  sample 
classifications  is  slightly  less  (2.41  versus  2.40)  for  Mode  1  alone.  This  indicates  that  the 
independent  sample  PR  cases,  included  only  in  the  I-score,  must  be  well  classified  using 
EOF  Mode  1  alone.  Both  Euclidean  models  based  on  the  R-00h  sets  demonstrate  higher 
skill  in  classifying  the  independent  sample  (D-score  -  2.48-2.55  and  I-score  “ 
2.40-2.41)  than  those  models  based  on  the  predictors  common  to  all  R-OOh  through  R-96 
sets  (D-score  *  2.56  -  2.73  and  I-score  »  2.52-2.60).  Therefore,  the  conclusion  from 
these  tests  is  that  the  EOF  modes  that  best  distinguish  between  the  R-OOh  and  straight- 
mover  cases  also  provide  the  best  Euclidean  model  skill  in  identifying  the  correct  12-h 
time-to*recurvature  (R-OOh  through  R-96h)  or  non-recurvature  (PRNR)  forecast  group. 

Based  on  the  above  conclusions,  the  search  for  an  overall  best  set  of  predictors 
for  the  Euclidean  model  is  confined  to  those  sets  that  provide  the  best  distinction  be¬ 
tween  the  clean  set  R-OOh  and  straight-mover  cases.  Since  the  problem  is  reduced  to  the 
separation  of  only  two  categories  of  data,  univariate  hypothesis  testing  can  be  used  to 
identify  the  modes  with  the  greatest  difference  between  R-OOh  and  NR  mean  values. 
Individually  these  modes  provide  the  greatest  separation  between  the  R-OOh  and  PRNR 
groups  in  one-dimensional  space.  Therefore,  some  combinations  of  these  modes  also 
should  provide  the  best  separation  of  the  R-OOh  and  PRNR  classification  groups  in 
multidimensional  EOF  space.  An  EOF  mode  is  identified  as  having  significantly  differ¬ 
ent  R-OOh  and  NR  means  if  the  p- value  for  a  two  sample  t-test  of  the  clean  set  R-OOh 
and  NR  coefficients  for  that  EOF  mode  is  less  than  or  equal  to  0.01.  Since  the  p-value 
is  the  smallest  significance  value  at  which  the  null  hypothesis  (that  the  R-OOh  and  NR 
means  are  equal)  can  be  rejected,  this  test  objectively  identifies  the  modes  with  the 
greatest  separation  between  the  R-OOh  and  NR  means  (Fig.  12).  As  expected,  the  largest 
difference  between  the  mean  EOF  values  for  the  R-OOh  and  NR  groups  at  700  mb  is  for 
Mode  1.  Ten  other  modes  also  have  significant  differences  in  mean  EOF  values  ac¬ 
cording  to  this  test.  Overall  predictor  sets  are  then  chosen  from  among  these  significant 
modes  using  the  stepwise  selection  criteria  described  in  Table  8. 

Euclidean  model  skill  in  identifying  recurvers  and  straight-movers  is  compared 
in  Fig.  13  for  R-OOh  predictors  selected  from  only  the  significant  modes  (top)  and  for  the 
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Fig.  12.  Significance  testing  for  Euclidean  clean  set  R-OOh  versus  NR 
means.  Solid  vertical  bars  indicate  the  700  mb  EOF  modes  (abscissa)  that  have 
significantly  (p»value  «;  0.01)  different  R*00h  and  NR  mean  coefficient  values 
(ordinate)  based  on  a  two  sample  t-test. 


same  number  (eight)  of  predictors  selected  from  among  the  45  EOF  modes  (bottom). 
Both  of  these  sets  demonstrates  better  skill  in  distinguishing  rccurvers  and 
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Table  8.  SIGNIFICANT  MODE  SELECTION  CRITERIA  FOR  R-OOH  PRE¬ 
DICTOR  SETS:  Criteria  as  in  Table  6,  except  applied  only  to  those 
modes  identified  by  a  two  sample  t-test  as  having  significantly  (p-value  £ 
0.01 )  different  R-OOh  and  NR  mean  coefficient  values. 
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straight-movers  than  the  Mode  1  model  in  Fig.  11  (top),  and  less  skill  in  the  R-36h 
through  R-96h  periods  for  the  optimum  time-step  model  in  Fig  1 1  (bottom).  One  ad¬ 
vantage  of  the  model  based  on  the  significant  modes  is  that  the  separate  levels  of  skill 
for  recurvers  and  straight-movers  a  e  more  consistent.  By  contrast,  the  nearly  equal 
combined  skill  for  the  numerical  Et.  F  mode  model  is  gained  by  much  better  skill  for 
straight-movers  than  for  recurvers. 

The  final  step  in  the  Euclidean  distance  model  development  is  then  to  evaluate 
the  R-0()h  predictor  sets  (F  through  J  in  Table  8)  and  identify  the  set  and  pressure  level 
with  the  highest  timc-to*recurvuture  classification  skill.  The  classification  matrix  scores 
for  the  independent  and  dependent  sample  classifications  for  the  EOF  modes  selected 
on  the  basis  of  hypothesis  tests  are  presented  in  Table  9. 

Even  though  the  number  of  EOF  modes  is  limited  by  the  significance  testing, 
the  selection  criteria  in  Table  8  can  lead  to  different  Euclidean  models.  Except  for  the 
700  mb  Mode  1  model,  the  largest  sets  of  predictors  are  selected  at  700  mb  (6-8  modes) 
and  <400  mb  (4-8  modes).  The  250  mb  model  using  only  two  or  five  modes  demonstrate 
the  best  skill  in  identifying  the  12-h  time-to-recurvature  groups  in  the  dependent  sample 
(250  mb  D-score  *  1.81-1.83,  400  mb  D-score  *  1.81-1.92  and  700  mb  D-score  =■ 
1.93-2.27).  As  noted  previously  for  the  Euclidean  models  in  Table  7,  the  degradation  in 
the  D-score  for  the  independent  samples  typically  is  not  linear.  For  the  independent 
sample,  the  skill  for  250  mb  (D-score  «  2.43*2.45  and  I-score  *  2.40-2.45)  and  700  mb 
(D-scorc  -  2.44-2.55  and  I-scorc  -  2.36-2.40)  are  nearly  comparable.  Less  skill  is 
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PERCENT  PERCENT 


Fig.  13.  Euclidean  method  classification  skill  using  R-OOh  sets  of  EOF 
predictors.  Classification  skill  for  700  mb  as  in  Fig.  10,  except  for  a  R-OOli  set 
chosen  (selection  criteria  1  in  Table  8)  from  only  those  modes  identified  with  signif¬ 
icantly  different  R-OOh  and  NR  clean  set  EOF  mean  coefficient  values  according  to 
a  two  sample  t-test  (top),  and  for  a  R-OOh  set  of  a  similar  number  (eight)  of  modes 
selected  (similar  to  criteria  D  in  Table  6,  except  limited  to  eight  vice  ten  predictors) 
from  all  45  EOF  modes  (bottom). 
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Table  9.  CLASSIFICATION  SKILL  FOR  EUCLIDEAN  MODELS  AT  THREE 
PRESSURE  LEVELS:  Classification  skill  as  in  Tabic  7  for  the  live  se¬ 
lection  criteria  F  through  J  (described  in  Table  8)  of  EOF  mode  predictors 
at  each  pressure  level  with  significantly  different  (two  sample  t-test  p-value 
£  0.01)  clean  set  R-OOh  and  NR  mean  EOF  coefficient  values. 
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noted  at  400  mb  (D-score  *  2.56-2.67  and  1-score  *  2.49-2.62).  These  sets  of  Euclidean 
model  predictors  identified  by  significance  testing  tend  to  outperform  the  R-OOh  sets  se¬ 
lected  (criteria  A  through  E  in  Table  6)  from  all  45  Modes  1-45  (not  shown),  unless  by 
chance  they  contain  the  same  modes. 

Judged  on  the  independent  sample  classification  matrix  D-scores,  the  best 
Euclidean  distance  model  using  the  250  mb  vorticity  includes  EOF  Modes  I,  6,  10,  12 
and  15  (lines  12  and  14).  Two  advantages  of  this  set  are  that  only  five  predictor  variables 
are  required  and  no  EOF  mode  greater  than  15  is  included.  By  contrast,  the  best  700 
mb  set  selected  using  criteria  I  has  eight  EOF  modes,  and  includes  higher  order  modes 
such  as  31,  34  and  39. 

C.  MODEL  EVALUATION 

The  final  Euclidean  model  at  250  mb  is  evaluated  in  terms  of  skill  in  classifying  the 
learning  set  of  782  cases  (Table  10).  The  combined  skill  in  correctly  identifying  recurvcrs 
(75%)  and  straight-movers  (68%)  during  the  72-h  forecast  period  is  71%.  This  com¬ 
pares  with  %R,  %S  and  %T  scores  of  36,  64  and  54  for  climatology  (Table  5).  Skill  in 


identifying  the  time  to  recurvature  is  best  near  recurvature  (R*00h  **  45%,  R-12h  * 
21%  and  R-24h  *  35%),  and  in  the  straight-track  categories  (R*96h  •  38%  and  PRNR 
*=  35%).  The  higher  skill  at  the  ends  of  the  forecast  interval  may  be  because  the  EOF 
predictor  modes  were  selected  to  achieve  maximum  separation  of  the  R-00h  and  the  NR 
mean  EOF  coefficients.  In  addition,  there  may  be  more  variability  in  the  vorticity  fields 
as  rccurvature  conditions  develop  (R-36h  through  R-84h). 

Table  10.  CLASSIFICATION  MATRIX  FOR  FINAL  EUCLIDEAN 
MODEL:  Classifications  for  observations  in  each  12-h  verification  cat¬ 
egory  and  the  percent  correctly  forecast  by  the  250  mb  Euclidean  model. 
Percent  of  recurvers  and  straight-movers  correctly  predicted  is  also  listed. 
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Bar  charts  (Fig.  14)  of  the  percent  of  learning  set  cases  in  each  12-h  verification 
category  that  arc  classified  into  each  time-to-rccurvature  group  fitrther  confirm  the  rel¬ 
atively  poor  ability  of  the  Euclidean  method  for  the  R-84h  through  R-36h  cases.  The 
intermediate  12-h  categories  not  shown  in  Fig.  14  tend  to  have  similar  characteristics  as 
the  24-h  bar  charts.  Times  near  recurvaturc  and  in  the  straight-track  categories  are 
better  classified  and  are  also  more  likely  to  be  classified  within  only  one  or  two  classi¬ 
fication  groups  of  the  correct  value.  Cases  in  the  intermediate  forecast  intervals  (R-36h 
through  R-84h)  are  more  likely  to  be  misclassificd,  and  the  classification  errors,  in  terms 
of  the  number  of  12-h  categories  between  the  forecast  and  the  verification  groups,  are 
greater. 

The  Euclidean  model  classifications  presented  above  have  higher  skill  than  the 
climatological  forecasts  of  the  learning  set  cases  (Table  5).  For  example,  the  I-  and 


Fig.  14.  Classification  bar  charts  at  24-h  intervals  for  the  Euclidean 
model.  Percent  of  N  cases  (ordinate)  verifying  as  R-OOh  (top  left),  R-24h  (top 
right),  R*48h  (middle  left),  R-72h  (middle  right),  R-96h  (bottom  left),  and  PRNR 
(bottom  right)  that  are  classified  into  each  group  R-OOh  through  PRNR  (abscissa). 
Shaded  bars  indicate  the  percent  in  the  correctly  classified  category. 
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R-scores  of  these  Euclidean  model  12-h  forecasts  are  2.34  and  2.10  versus  3.93  and  5.30 
for  the  climatological  forecasts,  respectfully.  However,  the  skill  in  identifying  the  time 
to  recurvature  is  less  than  desired  for  operational  use.  Because  of  the  subjectivity  in  the 
development  of  the  Euclidean  model,  these  results  should  not  be  used  to  make  final 
conclusions  regarding  the  usefulness  of  an  EOF  representation  of  vorticity  to  forecast 
tropical  cyclone  recurvature.  No  defmitive  method  was  found  for  selecting  the  optimum 
set  of  EOF  predictors  in  the  Euclidean  method.  In  addition,  each  EOF  mode  that  is 
selected  is  given  the  same  weighting,  rather  than  assigning  additional  influence  to  the 
modes  that  have  the  most  significance.  Furthermore,  the  use  of  a  small  clean  set  of 
storms  in  this  approach  may  not  provide  the  most  robust  definition  of  the  time-to- 
recurvature  classification  groups.  Nevertheless,  the  Euclidean  method  has  easily  under* 
stood  physical  interpretation  for  using  an  EOF  approach  in  identifying  the  vorticity 
patterns  associated  with  recurvature.  The  above  results  indicate  the  approximate  levels 
of  skill  that  can  be  expected  using  these  rredictors.  However,  a  more  objective  approach 
is  needed  to  identify  the  optimum  set  of  EOF  predictors  and  to  better  exploit  the  relative 
contributions  of  each  mode  in  the  recurvature  forecast  model. 
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IV.  DISCRIMINANT  ANALYSIS  APPROACH 


The  approach  in  this  section  is  to  use  discriminant  analysis  techniques  to  better  ex* 
ploit  the  predictive  skill  of  EOF  coefficients  of  vorticity  in  forecasting  tropical  cyclone 
recurvature.  The  UCLA  Biomedical  Computer  Program  BMDP7M  (Dixon  1988)  is  used 
to  select  the  predictors  and  develop  the  discriminant  analysis  model.  Although 
discriminant  analysis  is  a  seemingly  more  objective  approach  than  the  Euclidean  distance 
method,  the  user  must  still  make  many  choices  both  in  its  application  and  evaluation. 
Searching  for  the  optimum  discriminant  analysis  model  requires  extensive  testing  and 
should  be  conducted  on  a  much  larger  sample  population.  Thus,  the  goal  of  this  study 
is  to  isolate  a  justifiable  prediction  model  that  indicates  the  potential  of  this  method. 

A.  DISCRIMINANT  ANALYSIS 

Discriminant  analysis  is  a  statistical  procedure  for  identifying  the  boundaries  be¬ 
tween  groups  in  terms  of  the  variable  characteristics  that  distinguish  one  group  from 
another.  It  is  used  to  classify  cases  into  one  of  several  groups  and  to  examine  the  rela¬ 
tive  contributions  of  one  or  more  variables  in  distinguishing  between  groups.  The  pro¬ 
cedure  was  first  introduced  by  Fisher  (1936).  The  Fisher  discriminant  function  has  the 
form 


Z-alXi+a2X2  +  ...  +  apXpt  (4.1) 

where  Z  is  the  discriminant  score,  X„  X2,  ...X,  are  the  values  of  each  predictor  and 
<t„  fl2,  ...ap  are  coefficients  that,  if  standardized  by  pooled  standard  deviations,  give  an 
indication  of  the  relative  weight  of  each  predictor.  Discriminant  functions  are  derived 
such  that  the  differences  in  discriminant  scores  or  the  relative  distances  between  groups 
are  maximized.  The  first  function  separates  the  members  of  the  most  distinguishable 
group,  A',,  from  the  remainder  of  the  groups,  A,  through  A'.  The  second  discriminant 
function  separates  the  next  most  recognizable  group,  A2,  from  the  remaining  groups,  A, 
through  Kt.  The  number  of  functions  required  is  one  less  than  the  total  number  of 
groups,  g.  For  each  discriminant  function,  a  cutoff  score  is  found  by  taking  the  mean 
of  the  average  score  for  ail  cases  in  the  group  A,  and  the  average  score  for  aii  cases  in 
the  remainder  of  the  groups  AM  through  Kt,  An  individual  case  is  classified  into  group 
A,  if  its  discriminant  score  Z,  is  greater  than  the  cutoff  score  for  the  first  discriminant 
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function.  If  the  discriminant  score  is  less  than  the  cutoff,  a  second  discriminant  score 
is  calculated  using  the  second  discriminant  function.  The  second  discriminant  score  is 
compared  to  a  second  cutoff  score  to  determine  if  the  case  is  in  group  Kt  or  the  re¬ 
maining  groups  A',  through  Kt.  The  process  continues  until  the  case  is  classified. 

A  simpler  adaptation  of  Fishu's  classification  procedure  is  used  in  statistical  pack¬ 
ages  such  as  BMPD7M  (Kiecka  1980  and  Dixon  1988).  A  classification  function  for 
each  group  is  derived  as  a  linear  combination  of  coefficients  and  predictors  plus  a  con¬ 
stant  term.  Predictors  can  be  specified  or  they  can  be  selected  in  a  stepwise  fashion 
based  on  user-specified  criteria.  To  determine  group  membership,  each  function  is 
evaluated  using  the  predictor  values  of  the  test  case  to  obtain  a  classification  function 
score  for  each  group.  The  case  is  classified  into  the  group  for  which  it  has  the  highest 
classification  score. 

Classification  function  coefficients  cannot  be  standardized  and  interpreted  in  the 
same  manner  as  discriminant  function  coefficients  because  there  is  a  different  function 
for  each  group.  However,  discriminant  functions  can  be  computed  from  classification 
functions  to  examine  the  relationship  between  predictors  and  group  classification  (Afifi 
and  Clark  1984).  More  commonly,  statistics  derived  from  canonical  correlation  analysis 
techniques  are  used  for  this  purpose.  Canonical  correlation  analysis  examines  the  linear 
relationship  between  independent  variables  (predictors)  and  one  or  more  sets  of  de¬ 
pendent  variables  (groups).  A  linear  combination  of  predictors  called  a  canonical  vari¬ 
able  or  canonical  discriminant  function  is  formed  that  provides  the  best  separation 
among  groups.  Second  and  subsequent  canonical  discriminant  functions  are  then 
formed  that  arc  orthogonal  and  best  separate  the  groups  on  the  basis  of  associations  not 
used  in  the  preceding  canonical  discriminant  functions.  The  maximum  number  of 
canonical  discriminant  functions  is  equal  to  the  number  of  groups  minus  one  or  the 
number  of  predictor  variables,  whichever  is  less.  Canonical  discriminant  functions  can 
also  be  used  to  classify.  Final  classifications  will  generally  be  identical  to  those  obtained 
with  classification  functions  unless  the  group  covariance  matrices  are  not  equal  (Kiecka 
1980).  A  complete  discussion  of  the  application  of  canonical  correlation  statistics  to 
discriminant  analysis  can  be  found  in  Kiecka  (1980)  or  Afifi  and  Clark  (1984). 


B.  MODEL  ISSUES 

Several  issues  basic  to  the  development  of  a  discriminant  analysis  model  are  con¬ 
sidered  in  this  section.  These  issues  include  the  selection  of  a  dependent  sample,  how  far 
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in  advance  recurvature  can  be  recognized,  and  the  optimum  number  and  composition 
of  the  classification  groups.  Decisions  in  these  areas  are  based  on  the  ability  of  EOF 
modes  to  predict  recurvature  as  well  as  the  classification  goals  of  the  forecast  model. 
These  decisions,  in  combination  with  choices  in  the  application  of  the  discriminant 
analysis  method,  will  affect  the  level  of  classification  skill  that  can  be  achieved  with  a 
given  set  of  predictors  and  predictands.  Only  250  mb  data  are  considered  in  this  section, 
because  the  data  for  this  level  provided  the  best  discriminating  power  in  the  Euclidean 
distance  approach  and  in  comparative  tests  (not  shown)  using  discriminant  analysis. 

1.  Dependent  Sample  Selection 

Ideally,  the  sample  population  should  be  divided  into  dependent  and  independ¬ 
ent  subsets  to  permit  validation  of  the  discriminant  analysis  classification  model.  Clas¬ 
sification  functions  may  be  fit  well  to  a  small  dependent  sample,  but  not  be  effective  in 
predicting  an  independent  sample.  Independent  testing  is  thus  necessary  to  better  esti¬ 
mate  the  ability  to  correctly  predict  the  total  population.  Opinions  vary  on  the  appro¬ 
priate  sizes  of  the  subsets.  However,  the  dependent  subset  must  be  sufficiently  large  to 
ensure  the  stability  of  the  classification  function  coefficients  (Klecka  1980). 

Several  aspects  of  the  discriminant  analysis  must  be  specified  to  test  the  effect 
of  the  dependent  subset  options.  As  a  first  test,  the  classification  groups  will  be  the  same 
ten  categories  as  in  the  Euclidean  distance  approach:  rccurvature  time  to  recurvature 
time  minus  96  h  in  12-h  increments  plus  the  non-recurvers.  Although  only  straight- 
mover  storm  data  are  used  to  describe  the  non-recurver  group  while  developing  the 
discriminant  analysis  model,  later  tests  will  consider  the  observations  more  than  96-h 
prior  to  recurvature  as  part  of  the  straight-mover  set.  Classification  functions  are  de¬ 
rived  from  predictors  selected  in  a  stepwise  fashion  using  a  common  F-to-enter  value  of 
2.5.  Although  dependent  subsets  vary  in  size  from  158  to  510  cases,  this  F-to-enter  value 
is  significant  at  better  than  the  99th  percentile  for  all  subsets.  Therefore,  differences  in 
predictors  selected  in  the  discriminant  analysis  procedure  can  be  mainly  attributed  to 
statistical  differences  among  the  dependent  subsets.  The  classification  functions  then 
are  used  to  classify  both  the  dependent  subset  and  the  remaining  independent  cases  in¬ 
cluding  pre-recurver  cases.  Classification  matrix  scores  (described  in  Section  I1.C.2),  are 
computed  for  dependent  and  independent  subset  classifications  separately  and  for  the 
entire  sample  classifications. 

The  purpose  of  the  intercomparison  of  the  classification  models  derived  from 
13  different  dependent  subsets  of  the  250  mb  sample  (Table  11)  is  to  test  the  stability 
of  the  classification  functions.  Whole-storm  data  from  the  same  set  of  clean  recurving 
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and  straight-moving  storms  in  the  Euclidean  distance  approach  are  used  to  form  the  first 
dependent  subset.  Since  EOF  coefficients  tend  to  progress  in  a  similar  manner  as  storms 
approach  recurvature,  analyzing  a  subset  comprised  of  entire  storms  may  lend  statistical 
stability  to  the  analysis.  Two  other  whole-storm  dependent  subsets  are  formed  from  all 
1979  to  1982  storms  and  from  a  random  selection  of  two-thirds  of  the  storms  in  the 
sample  population.  To  test  the  stability  of  these  classification  functions,  ten  dependent 
subsets  are  formed  by  random  selection  of  two-thirds  of  the  cases  in  the  sample 
population. 

Table  11.  DEPENDENT  SUBSET  SELECTION:  Stepwise  discriminant  analysis 
of  the  times  to  recurvature  for  13  dependent/ independent  subsets  of  250 
mb  vorticity  EOF  coefficients,  which  are  indicated  in  the  order  they  were 
selected. 


INDEPENDENT 

DEPENDENT 

COtBXNED 

DEPENDENT 

PREDICTORS 

D-SCORE 

0-SCORE 

1 -SCORE 

SUDSET 

N 

IEOF  MODES It 

2.59 

CLEAN  STORMS 

158 

1  19 

2.40 

79-82  STORMS 

1 

2 

3 

5  36 

6  45 

HOC  JM 

2.17 

RANDOM  STORMS 

449 

1 

3 

2 

5  41 

7  6 

2.47 

2.37 

2.41 

RANDOM  CASES  1 

442 

1 

2 

3 

5  15 

2.27 

2.20 

2.23 

RANDOM  CASES  2 

457 

1 

2 

3 

5  24 

6  41 

2.32 

2.08 

2.15 

RANDOM  CASES  3 

454 

1 

2 

5 

3  9 

6  24 

2.49 

2.35 

2.36 

RANDOM  CASES  4 

443 

1 

2 

3 

6  6 

2.50 

2.08 

2.25 

RANDOM  CASES  5 

427 

1 

5 

2 

3  6 

24  23 

2 .55 

2.48 

2.50 

RANDOM  CASES  6 

435 

1 

3 

2 

5  45 

2.38 

2.24 

2.31 

RANDOM  CASES  7 

452 

1 

2 

3 

6  41 

14 

2.33 

2.23 

2.22 

RANDOM  CASES  8 

451 

1 

3 

5 

14 

2.11 

2.15 

2.10 

RANDOM  CASES  9 

426 

1 

2 

5 

7  24 

6 

2.12 

2.11 

2.08 

RANDOM  CASES10 

435 

1 

2 

5  24  6 

4 

The  results  in  Table  1 1  reflect  differences  due  to  the  independent  sample  com¬ 
position  as  well  as  to  the  dependent  sample  composition.  If  the  classification  functions 
were  very  stable,  the  various  methods  of  subsampling  in  Table  1 1  should  have  involved 
the  same  predictors  and  have  nearly  equivalent  dependent-independent  verification 
scores.  In  practice,  predictors  vary  in  number,  modes  and  in  the  order  selected.  This 
order  is  not  necessarily  indicative  of  their  relative  importance  because  a  strong 
discriminator  may  be  selected  late  or  not  at  all  in  a  stepwise  analysis  if  the  intercorre¬ 
lation  with  other  variables  reduces  its  unique  contribution  to  the  analysis.  Mode  1  EOF 
coefficient  is  the  only  predictor  selected  for  all  13  dependent  subsets  tested.  Modes  5,  2 
and  3  are  selected  in  12,  1 1  and  10  of  the  subsets  respectively.  Since  Mode  6  appears  in 
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eight  of  the  subsets,  it  is  potentially  important  in  the  discriminant  analysis.  Notice  that 
Mode  4  is  selected  only  once,  and  that  many  higher  order  modes  are  selected  after  Mode 
6. 

Classification  matrix  scores  also  vary.  Classification  functions  derived  for  the 
clean  storm  sample  in  Table  1 1  demonstrate  a  high  degree  of  skill  in  classifying  the  de¬ 
pendent  subset  cases,  but  perform  poorly  in  classifying  independent  subset  cases.  The 
combined  classification  matrix  score  for  this  subset  pair  is  much  worse  than  for  other 
pairs,  which  indicates  that  the  model  is  well-fitted  to  the  dependent  subset  only  and  is 
not  accurate  on  an  independent  subset.  This  result  may  suggest  a  flaw  in  the  use  of  the 
clean  storm  set  for  the  Euclidean  method,  where  the  excellent  distinction  in  the  depend¬ 
ent  set  was  not  sustained  in  the  remaining  cases. 

The  independent  test  results  can  be  overly  optimistic  if  the  subset  contains  a 
disproportionate  number  of  cases  that  are  statistically  easy  to  classify.  For  example,  the 
classification  functions  derived  from  1979*1982  storm  data  demonstrate  better  skill  in 
classifying  the  independent  subset  (1983-1984)  than  the  dependent  subset.  This  unex¬ 
pected  result  can  be  explained  by  examining  the  differences  in  the  storms  between  the 
two  subsets.  Patrick  Harr  (personal  communication)  found  that  western  North  Pacific 
tropical  cyclones  during  1983  and  1984  had  recurvature  tracks  that  were  similar  to 
climatology  and  were  relatively  easy  to  forecast  in  comparison  to  those  in  the  previous 
four  years. 

The  10  random  subsets  in  Table  11  were  generated  to  test  whether  the  classi¬ 
fication  functions  derived  from  dependent  subset  predictors  would  work  equally  well  on 
the  independent  subset.  In  other  words,  the  randomly  selected  cases  should  have  nearly 
equal  classification  matrix  scores.  Only  the  two  subset  pairs  formed  from  the  ninth  and 
tenth  randomly  selected  cases  have  nearly  equal  dependent  and  independent  classifica¬ 
tion  matrix  scores.  Because  the  classification  model  derived  from  randomly  selected 
storms  (line  3  in  Table  1 1)  outperforms  those  derived  from  randomly  selected  cases  in 
dependent  subset  classification,  retaining  data  from  entire  storms  in  the  dependent  set 
may  aid  in  the  derivation  of  skillful  classification  functions.  However,  the  marked  de¬ 
gradation  in  the  independent  D-score  for  the  random  storm  independent  subset  indicates 
that  the  differences  in  skill  are  also  a  function  of  which  subset  contains  more  storms  that 
are  inherently  easier  to  classify. 

The  conclusion  from  Table  11  is  that  classification  functions  derived  from  ran¬ 
domly  selected  subsamples  of  this  data  set  are  not  statistically  stable.  To  improve  the 
stability,  the  entire  sample  population  will  later  be  used  to  both  derive  and  test 
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classification  functions.  In  lieu  of  independent  testing,  jackknifing  is  employed  to  assess 
the  degradation  in  classification  skill  expected  in  the  total  population.  In  this  procedure, 
N  sets  of  classification  functions  are  derived  by  successively  withholding  one  case  from 
the  sample  of  N  cases.  Each  of  the  N  sets  of  classification  functions  is  tested  on  the  one 
case  that  was  withheld,  and  the  summation  of  these  verifications  is  an  indication  of  the 
likely  accuracy  of  a  single  discriminant  analysis  based  on  the  entire  sample.  Although 
jackknifed  results  are  computed  for  each  discriminant  analysis,  they  will  be  presented 
only  in  the  selection  and  testing  of  an  optimal  classification  model  for  this  sample  of 
storms  (Sections  IV.D  and  IV.E.l). 

2.  Limits  of  discrimination  for  time  to  recurvature 

A  basic  question  is  the  limitation  of  the  discriminant  analysis  to  separate  the 
EOF  coefficients  associated  with  storms  more  than  96  h  before  recurvature  from  the 
straight-mover  coefficients.  To  illustrate  this  limitation,  univariate  statistics  for  EOF 
Mode  1  coefficients  are  compared.  Mode  1  not  only  accounts  for  the  largest  percent  of 
the  variance  in  the  synoptic  vorticity  patterns,  but  also  demonstrates  the  greatest  pre¬ 
dictive  capability.  It  is  the  only  predictor  consistently  selected  and  is  selected  first  in  all 
subset  analyses  (Table  1 1 ). 

Distributions  of  the  Mode  1  means,  95%  confidence  intervals  and  the  standard 
deviations  for  times  R-OOh  through  R*96h  in  12-h  increments,  plus  the  pre-recurvers 
(PR)  and  non-recurvers  (NR)  in  the  entire  250  mb  data  set  are  presented  in  Fig.  15. 
Univariate  statistics  lor  a  combined  PR  and  NR  group  are  also  plotted.  Group  statistics 
can  be  interpreted  in  terms  of  the  physical  processes  they  represent.  Recall  that  the 
pattern  for  Mode  1  (Fig.  2)  is  representative  of  the  vorticity  pattern  associated  with  a 
storm  in  the  monsoon  trough  and  that  the  magnitude  of  the  coefficient  is  indicative  of 
the  importance  of  the  pattern  (or  the  opposite  pattern  if  it  is  negative).  Group  means 
vary  almost  linearly  from  large  positive  values  for  non-recurver  and  pre-rccurver  situ¬ 
ations  to  large  negative  values  at  recurvature.  Variances  are  large  and  the  considerable 
overlap  among  groups  indicates  the  variability  in  vorticity  patterns  that  lead  to  recur¬ 
vature.  These  group  means  are  most  separated  in  the  36  h  preceding  recurvature. 
However,  variances  are  also  largest  during  these  times.  These  large  differences  are  as¬ 
sociated  with  rapid  changes  in  the  storm-centered  vorticity  patterns  accompanying 
storms  moving  around  the  subtropical  ridge.  In  contrast,  vorticity  patterns  change  little 
for  storms  moving  along  the  monsoon  trough  well  prior  to  recurvature. 

The  challenge  for  the  discriminant  analysis  (or  the  Euclidean  method)  is  to  dis¬ 
tinguish  those  EOF  modes  that  best  indicate  the  time  to  recurvature.  With  the  similar 
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group  means  for  NR,  PR,  R*96h  and  R-84h  times,  it  is  unlikely  that  the  discriminant 
analysis  could  consistently  separate  these  groups  from  Mode  1  only.  The  NR  group 
mean  is  slightly  smaller  than  the  group  means  for  pre-recurvers  and  the  R-96h  cases. 
Combining  non-recurver  and  pre-recurver  subsamples  provides  a  smoother  and  more 
physically  plausible  transition  among  groups.  That  is,  the  R-96h  and  R-84h  samples 
might  also  have  been  added  to  the  new  PR  and  NR  group.  However,  since  the  official 
JTWC  forecast  period  is  72  h,  retaining  the  R*96h  and  R-84h  as  separate  classification 
categories  provides  a  forecast  'buffer'.  The  R-96h  and  R-84h  predictions  provide  an 
alert  of  a  trend  toward  recurvature,  but  not  within  the  current  72-h  forecast  period. 
Statistically,  these  intermediate  groups  decrease  the  likelihood  that  non-recurvature  sit¬ 
uations  will  be  misclassified  into  the  next  similar  group,  and  thus  prompt  the  forecaster 
to  erroneously  predict  recurvature  within  the  72-h  forecast  period. 

Based  on  the  above  considerations,  the  PR  sample  is  combined  with  the 
straight-mover  sample  to  define  the  PRNR  classification  group.  The  merits  of  other 
data  combinations  are  better  assessed  in  terms  of  gains  in  classifiability  versus  loss  of 
time  resolution.  These  issues  are  explored  in  the  next  section. 


Fig.  15.  Univariate  statistics  for  250  mb  vorticity  EOF  Mode  1.  Mean  (open 
circle),  95%  confidence  interval  (solid  bar)  and  standard  deviation  (x)  for  individual 
groups  (solid  line)  and  for  the  combined  PR  and  NR  group  data  (dotted  line). 
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3.  Number  and  composition  of  classification  groups 

The  analysis  goal  is  to  fully  utilize  the  time  resolution  of  the  data  set  to  predict 
the  time  to  recurvature  in  12-h  increments.  However,  the  EOF  coefficients  for  the 
vorticity  fields  may  not  have  enough  discriminating  power  to  reliably  discern  between 
synoptic  situations  with  this  time  resolution.  Perhaps  these  predictors  would  be  suited 
to  classify  some  combination  of  groups  with  decreased  time  resolution.  Thus,  combi¬ 
nations  of  groups  are  tested  to  increase  the  percent  of  correct  classifications  and  still 
retain  some  of  the  time  resolution  desired  by  the  forecaster. 

The  univariate  distributions  of  EOF  Mode  1  coefficient  for  each  12-h  group  in 
Fig.  15  indicate  that  this  predictor  alone  cannot  adequately  discriminate  among  neigh¬ 
boring  groups.  Other  EOF  modes  may  provide  additional  dimensions  that  distinguish 
differences  among  the  12-h  groups.  The  effect  of  multiple  predictors  and  the  effect  of 
combining  groups  on  classification  skill  is  best  examined  by  discriminant  analyses  and 
canonical  correlation  statistics. 

To  evaluate  the  trade-off  between  time  resolution  and  forecast  accuracy,  com¬ 
binations  of  time  groups  are  tested  that  are  potentially  easier  to  classify  and  are  still 
useful  to  the  forecaster.  Stepwise  discriminant  analysis  is  performed  using  F-to-enter 
values  significant  at  the  0.01  level  for  the  sample  size.  Analysis  models  with  two,  three 
and  ten  classification  groups  are  compared  in  Tables  12,  13  and  14,  Verifications  are 
identified  by  their  12-h  data  categories  so  that  the  loss  of  time  resolution  in  the  classi¬ 
fication  groups  can  be  appreciated.  Pre-recurver  data  are  combined  with  non-recurver 
data  for  both  classifications  and  verifications. 

The  minimum  useful  distinction  for  the  forecaster  is  between  recurving  and 
straight-moving  storms.  The  recurver  group  is  defined  by  R-OOh  through  R-72h  sam¬ 
ples,  and  the  straight-moving  group  is  defined  by  R-84h  through  pre-recurver  and 
straight-mover  data.  Thus,  a  successful  prediction  would  identify  either  a  recurving  or 
straight  track  during  the  current  72-h  forecast  period.  The  two-group  discriminant 
analysis  (Table  12)  correctly  identifies  R-OOh  to  R-72h  cases  as  recurvers  with  76  %  ac¬ 
curacy.  The  verifications  within  each  12-h  category  do  not  have  the  same  skill.  The 
percent  correctly  classified  decreases  from  95%  for  cases  at  recurvatu.e  time  to  44%  at 
R-72I1.  This  is  because  times  closer  to  recurvature  are  more  distinct  from  the  straight 
classification  group  and  are,  therefore,  more  readily  recognized  as  recurvers.  Non¬ 
recurvature  is  correctly  predicted  for  81%  of  the  R-84h  through  PRNR  cases.  The 
combined  sample  skill  is  79%,  but  this  total  is  dominated  by  skill  in  prediction  of 
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straight  track  motion  because  of  the  larger  number  of  non-recurver  cases  (445)  than 
recurver  cases  (337)  in  the  sample  population. 


Table  12.  TWO-GROUP  DISCRIMINANT  ANALYSIS  MODEL:  Percent  of 
recurvers  and  straight-movers  in  the  sample  population  correctly  pre¬ 
dicted  by  the  two-group  discriminant  analysis  model  with  the  EOF  modes 
indicated.  Number  of  classifications  as  recurvers  or  straight- movers  are 
provided  with  12-h  time  resolution  to  indicate  at  what  times  this  analysis 
model  succeeds  or  fails. 


MODES)  1  3 

S  36  41  24 

14  38  43  6  39 

27 

CLASSIFICATION 

VERIFY 

RECURVER 

STRAIGHT 

RECURVER) 

R-OOH 

52 

3 

(76/0 

R-12H 

53 

3 

R-24H 

47 

8 

R-36H 

37 

15 

R-48H 

31 

15 

R-60H 

21 

20 

R-72H 

14 

18 

STRAIGHT) 

R-84H 

9 

21 

(81/0 

R-96H 

5 

19 

PRNR 

72 

319 

TOTAL  (79JO 

A  more  useful  distinction  to  the  forecaster  would  be  separation  into  high,  me¬ 
dium  and  low  likelihood  of  recurvature  (Table  13).  The  high  group  is  defined  as  the 
R-OOh  to  R-24h  cases,  the  medium  group  as  all  R-36II  to  R-72H  cases,  and  the  low 
group  as  the  R-84h  through  PRNR  cases.  Classification  functions  correctly  classify  the 
sample  into  high,  medium  or  low  categories  with  75%,  56%  and  73%  accuracy,  respec¬ 
tively.  While  this  three-group  classification  scheme  increases  the  time  resolution  in  the 
recurvature  prediction,  the  ability  to  correctly  classify  track  types  during  the  72-h  fore¬ 
cast  period  (77%)  is  less  than  that  in  the  two-group  model  (79%).  In  addition,  skill  in 
identifying  straight-moving  situations  is  degraded.  The  addition  of  another  recurver 
group  can  be  viewed  as  increasing  the  number  of  correct  classification  categories  for 
recurver  cases  and  increasing  the  number  of  incorrect  classification  categories  for  non- 
recurvers.  The  intermediate  group  also  increases  the  discriminant  analysis  model 
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separation  between  the  12-h  data  categories  in  the  high  recurver  group  and  the 
non-recurver  categories. 


Table  13.  THREE-GROUP  DISCRIMINANT  ANALYSIS  MODEL:  Percent  of 
recurvers  and  straight-movers  correctly  predicted  by  a  three-group 
discriminant  analysis  model.  Format  is  similar  to  Table  12. 
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14 
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24 

IS 
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3 

14 

13 
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R-84H 

2 

11 

17 

C73XJ 

( 73/0 

R-96H 
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4 

18 

PRNR 

13 

89 
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total  nr/.) 

Discriminant  analysis  into  the  ten  12-h  classification  groups  R-OOh  to  R-96U 
plus  PRNR  (Table  14)  maximizes  the  time  resolution  of  the  predictions  for  this  data  set, 
but  at  the  expense  of  classification  accuracy.  The  ability  to  distinguish  between  recurver 
and  straight-track  situations  is  only  72%  as  compared  to  the  79%  and  77%  accuracy 
achieved  by  the  two-group  (Table  12)  and  the  three-group  (Table  13)  forecast  models, 
respectively.  Also,  the  ability  to  discern  high,  medium  and  low  likelihood  of  recurvaturc 
is  2-6%  less  than  for  the  three-group  model.  The  improvement  in  recurver  classification 
skill  relative  to  skill  in  forecasting  straight  track  situations  between  the  two-group  model 
and  three-group  model  is  again  noted.  The  ten-group  model  correctly  classifies  recurver 
and  straight  situations  with  79%  and  67%  accuracy,  respectively.  The  greater  difference 
in  recurver  versus  straight  classification  skill  for  the  two-group  model  (6%)  than  for  the 
three-group  model  (2%)  may  be  due  to  the  increase  in  the  ratio  of  the  number  of 
recurver  to  straight  groups  from  2.0  (two-group  model)  to  2.3  (three-group  model).  As 
expected,  classification  accuracy  within  each  12-h  verification  category  is  considerably 
less  than  for  the  broader  high-medium-low  or  rccurver-straight  categories.  Higher  skill 
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exists  in  correctly  classifying  cases  at  the  extremes  of  the  forecast  continuum,  i.e.,  at  re¬ 
curvature  (60%)  and  PRNR  (48%).  Skill  in  identifying  cases  in  the  intermediate  cate¬ 
gories  ranges  from  15%  to  38%. 


Table  14.  TEN-GROUP  DISCRIMINANT  ANALYSIS  MODEL:  Percent  of 
recurvers  and  straight-movers  correctly  predicted  by  a  ten-group 
discriminant  analysis  model.  Format  is  analogous  to  Tables  12  and  13. 
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The  distributions  of  group  centroids  in  discriminant  space  (Figs.  16,  17  and  18) 
provide  a  graphic  representation  of  the  ability  of  the  two-,  three-,  and  ten-group 
discriminant  analysis  models  to  separate,  and  thus  classify,  the  cases  belonging  to  each 
group.  Individual  cases  and  group  centroids  may  be  located  in  discriminant  space  using 
canonical  correlation  analysis  techniques  discussed  in  Section  IV.A.  An  x-  and  y- 
coordinate  for  each  case  is  found  by  evaluating  the  first  and  second  canonical 
discriminant  functions,  respectively,  using  the  predictor  values  for  the  case.  Group 
centroids  are  then  located  at  the  mean  of  the  x-coordinates  and  the  mean  of  the  y- 
coordinatcs  for  all  cases  in  the  classification  group.  The  mean  cor  uinates  of  the  cases 
in  each  12-h  verification  category,  hereafter  referred  to  as  time  centroids,  arc  also  com¬ 
puted  for  the  two-  and  three-group  models  (Figs.  16  and  17),  Because  these  time 
centroids  are  equivalent  to  the  group  centroids  for  the  ten-group  model,  they  provide  a 
means  of  comparing  the  relative  separation  achieved  by  each  of  the  three  discriminant 
analysis  models. 
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As  the  12-h  time  centroids  are  in  sequential  order,  they  reflect  the  time  trends 
in  the  patterns  accompanying  recurvature.  Notice  that  the  relative  separations  between 
consecutive  time  centroids  are  generally  similar  along  the  first  canonical  discriminant 
function  for  all  three  models  (Figs.  16,  17  and  18).  These  relative  distances  between 
group  centroids  indicate  how  well  the  model  is  able  to  distinguish  between  groups.  The 
proximity  of  the  12-h  time  centroids  to  their  parent  group  centroid  gives  an  indication 
of  the  classification  accuracy  for  each  12-h  verification  category  or  combination  of  12-h 
categories. 

Since  the  number  of  canonical  discriminant  functions  computed  is  one  less  than 
the  number  of  groups,  a  one-dimensional  plot  is  presented  for  the  two-group  model  (Fig. 
16).  The  time  centroids  of  the  R-84h  through  PRNR  verification  categories  that  com¬ 
prise  the  straight-track  group  are  all  closer  to  the  straight-group  centroid  than  to  the 
recurver-group  centroid.  In  addition,  the  R-72h  time  centroid  (which  belongs  in  the 
recurver  group)  is  closer  to  the  straight-group  centroid.  While  the  actual  distribution 
of  individual  cases  determines  the  model  skill  reported  in  Table  12,  the  relative  positions 
of  the  time  and  group  centroids  in  Fig.  16  illustrate  why  the  model  is  better  at  correctly 
classifying  straight  track  cases  (81%)  than  it  is  at  correctly  identifying  recurver  cases 
(76%).  The  fitted  distributions  of  the  straight-mover  and  recurver  cases  confirm  these 
observations.  The  cases  in  the  straight  group  are  more  closely  distributed  about  their 
group  centroid  than  are  the  cases  in  the  recurver  group.  The  amount  of  overlap  between 
the  two  distributions  indicates  the  number  of  cases  that  may  be  misclassified  by  the 
two-group  model.  As  previously  noted,  canonical  discriminant  functions  can  be  used  to 
classify  by  computing  a  canonical  discriminant  function  score  (not  shown)  to  divide  the 
cases  into  straight-movers  and  recurvers.  Such  a  dividing  line  in  Fig.  16  would  give  an 
exact  representation  of  the  number  of  cases  that  would  be  misclassified  into  each  group 
by  the  first  canonical  discriminant  function. 

In  the  multiple-group  discriminant  analysis  models,  classification  skill  is  a 
function  of  how  well  an  individual  group  is  separated  from  its  neighboring  groups  and 
the  actual  distribution  of  its  individual  members.  The  three-group  and  ten-group  model 
centroids  (Figs.  17  and  18)  are  spatially  separated  in  a  curvilinear  fashion  that  reflects  a 
consistent  time  trend  in  the  group  centroids.  These  separation  patterns  explain  why  the 
classification  skill  is  highest  for  the  groups  on  cither  end  of  the  time  spectrum.  Consider 
a  normal  distribution  of  sample  cases  for  each  group  in  an  ellipsoidal  pattern  centered 
around  each  group  centroid.  For  a  middle  group  with  neighbors  on  either  side,  classi¬ 
fication  skill  is  a  function  of  its  separation  from  neighboring  groups  and  the  distribution 
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1ST  CANONICAL  DISCRIMINANT  FUNCTION 


Fig.  16.  Canonical  discriminant  function  centroids  for  the  two-group 
model.  Group  centroids  are  located  at  the  mean  of  all  cases  in  each  group  (vertical 
lines).  Time  centroids  for  PRNR  through  R-OOh  in  12-h  intervals  are  indicated  by 
(O)  for  categories  belonging  to  the  straight  group  and  (X)  for  categories  belonging 
to  the  recurver  group.  Fitted  distributions  of  the  straight-movers  and  recurvers 
along  the  first  canonical  discriminant  function  axis  illustrate  the  overlap  of  the  dis¬ 
tributions. 


of  the  individual  cases  belonging  to  the  group.  Since  the  end  groups  have  no  neighbor¬ 
ing  group  on  one  side,  sample  cases  on  the  no-neighbor  side  of  the  distribution  will  be 
classified  into  the  end  group.  For  both  the  three-  and  ten-group  models  (Tables  13  and 
14),  classification  skill  is  notably  higher  in  the  end  groups.  Note  that  for  the  three-group 
model,  the  skill  is  higher  for  the  high  likelihood  of  recurvature  group,  than  the  low  like¬ 
lihood  of  recurvature  group  because  the  high  group  is  better  separated  from  the  inter¬ 
mediate  medium  group  than  the  low  group.  Differences  in  group  classification  skill  for 
the  ten-group  model  are  more  difficult  to  interpret.  For  example,  the  R-OOh  to  R-36h 
group  centroids  are  better  separated  in  Fig.  18  than  those  for  R-48h  to  R-96h.  In  gen¬ 
eral,  better  separated  groups  in  Fig.  18  demonstrate  better  classification  skill.  One  ex¬ 
ception  is  the  R-72h  group,  which  has  higher  skill  than  the  more  separated  groups  and 
may  reflect  the  effect  of  the  distribution  of  the  sample  cases  on  classification  skill. 

Ultimately,  the  number  and  composition  of  classification  groups  must  be  a 
trade-off  between  the  forecaster's  need  to  specify  a  precise  time  of  recurvature  versus  the 
diminishing  skill  as  more  precision  is  attempted.  To  illustrate  this  trade-off  in  forecast 
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Fig.  17.  Canonical  discriminant  function  centroids  for  the  three-group 
model.  The  first  two  canonical  discriminant  functions  form  the  axes  along  which 
group  centroids  (solid  markers)  and  12-h  time  centroids  (open  markers)  are  plotted 
for  high  (circles),  medium  (triangles)  and  low  (squares)  likelihood  of  recurvature 
classification  groups. 


Fig.  18.  Canonical  discriminant  function  centroids  fcr  the  ten-group  model.  The 
first  two  canonical  discriminant  functions  form  the  axes  along  which  group  centroids 
(solid  circles)  arc  plotted  for  R*00h  through  I*RNR  classification  groups. 
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accuracy  and  time  resolution,  a  ten*group  model  will  be  pursued  using  the  entire  250 
mb  sample  population.  The  ten-group  model  is  chosen  to  fully  test  the  predictive  ca¬ 
pability  of  HOF  representation  of  the  synoptic  vorticity  fields  to  predict  recurvature  at 
the  resolution  of  the  data  set. 

C.  APPLICATION 

Discriminant  analysis  packages  include  options  that  permit  flexibility  in  the  selection 
of  predictors  and  in  the  method  of  computing  the  classification  functions.  The  optimal 
application  of  these  program  features  is,  of  course,  a  function  of  the  goals  of  the  analy¬ 
sis.  In  this  section,  several  discriminant  analysis  options  available  in  BMDP7M  are 
considered.  These  features  include  prior  probabilities,  contrasts  and  three  different 
methods  for  entering  predictors  into  the  analysis.  An  in  depth  discussion  of  all  the  fea¬ 
tures  available  in  computer  packages  and  a  comparison  of  BMDP7M  with  other 
discriminant  analysis  packages  can  be  found  in  Tabachnick  and  Fidell  (1989). 

1.  Adjusting  for  prior  probabilities  and  the  cost  of  misclassification 

The  prior  probability  is  the  probability  that  an  individual  case  selected  for  a 
group  is  actually  a  member  of  that  group.  Unless  otherwise  specified,  cases  are  assumed 
to  have  an  equal  probability  of  belonging  to  any  group,  and  the  classification  functions 
are  derived  with  equal  probabilities  of  misclassification.  Specifying  group  probabilities 
in  the  analysis  procedure  changes  the  ratio  of  the  probability  of  errors  by  adjusting  the 
discriminant  function  scores,  or  equivalently  by  adjusting  the  constant  terms  in  the 
classification  functions  to  achieve  a  ratio  of  errors  consistent  with  the  designated  prior 
probabilities.  Prior  probabilities  have  the  greatest  effect  on  classification  skill  when 
groups  are  relatively  indistinct  from  one  another. 

In  this  study,  prior  probabilities  could  be  assigned  on  the  basis  of  the  group 
sizes  in  the  sample  or  according  to  the  climatological  probability  that  a  synoptic  situ¬ 
ation  belongs  to  each  group.  For  example,  prior  probabilities  based  on  the  relative  size 
of  each  of  the  ten  12-li  groups  w'ould  range  from  50%  for  the  PRNR  group  to  3-7%  for 
the  remaining  groups.  Thus,  a  discriminant  analysis  model  including  these  prior  proba¬ 
bilities  would  only  classify  a  case  into  one  of  the  groups  w-ith  a  low  prior  probability  if 
there  was  very  strong  evidence  that  it  was  not  in  the  PRNR  group.  Prior  probabilities 
could  also  be  used  to  achieve  some  other  desired  ratio  of  errors  that  w'ould  be  better 
suited  to  the  needs  of  the  forecaster.  That  is,  if  the  cost  of  misclassification  of  a  certain 
group  w'as  high,  assigning  a  high  prior  probability  to  that  group  would  decrease  the 
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likelihood  that  a  case  belonging  to  that  group  would  be  misclassified  into  another  group 
with  a  lower  prior  probability. 

Assigning  prior  probabilities  may  be  advantageous  in  future  applications  of 
discriminant  analysis  to  line  tune  the  recurvature  forecast  model  using  EOF  predictors. 
Since  such  adjustments  to  the  analysis  may  not  give  a  true  indication  of  the 
discriminatory  power  of  the  EOF  coefficients,  prior  probabilities  will  not  be  specified  in 
this  study. 

2.  Contrasts  to  direct  stepwise  selection  of  predictors 

In  discriminant  analysis,  a  contrast  is  a  series  of  coefficients,  one  for  each  clas¬ 
sification  group,  that  modifies  the  stepwise  selection  of  predictors.  The  coefficient  for 
each  group  indicates  the  relative  amount  of  differentiation  desired  between  groups. 
Coefficients  must  be  specified  such  that  the  sum  of  the  coefficients  for  all  groups  equals 
zero.  Contrasts  do  not  affect  the  computation  of  the  classification  functions  as  do 
posterior  probabilities.  Rather,  contrasts  affect  the  computation  of  the  F-to-enter  and 
F-to-remove  statistics.  Therefore,  they  alter  the  stepwise  selection  of  predictors  such 
that  only  those  predictors  that  maximize  the  differences  between  groups  arc  selected  in 
the  analysis.  Thus,  contrasts  can  also  be  used  to  determine  which  predictors  are  im¬ 
portant  in  distinguishing  between  specific  pairs  of  groups.  In  developing  a  ten-group 
discriminant  analysis  model,  this  is  not  practical  because  of  the  large  number  of  pairwise 
tests  to  be  considered.  Furthermore,  it  may  not  be  a  sound  method  of  selecting  a  final 
set  of  predictors  because  the  predictors  that  are  useful  in  distinguishing  between  some 
pairs  may  adversely  affect  discrimination  between  other  pairs. 

Several  contrasts  are  tested  in  Table  15.  Because  F-to-enter  computations  are 
different  for  each  analysis,  stepwise  selection  of  EOF  modes  is  stopped  after  selection 
of  the  ten  predictors.  No  contrasts  are  specified  in  the  first  analysis  in  Table  15  to  allow 
comparison  with  the  different  contrasts.  The  second  analysis  is  designed  to  maximize 
the  difference  between  the  recurvature  cases  and  the  straight-track  cases.  The  third 
maximizes  distinction  among  all  recurver  situations  (less  than  72  h)  and  among  all 
straight  situations  equally.  The  fourth  and  fifth  analyses  are  designed  to  maximize  the 
rccurvature  and  non-recurvature  situations  and  also  to  enhance  the  distinction  among 
recurvature  groups. 

Maximizing  the  difference  between  R-OOh  and  PRN'R  groups  (line  2  in  Table 
15)  improves  the  overall  classification  skill  (I*score),  but  skill  in  distinguishing  time  to 
recurvature  (R-score)  is  less  than  using  no  contrasts.  The  contrast  designed  to  increase 
the  differences  equally  among  all  recurver  and  straight  groups  (line  3)  improves 
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Table  15.  COMPARISON  OF  MODELS  USING  CONTRASTS:  Effect  of  vari¬ 
ous  contrasts  on  predictor  selection  and  discriminant  analysis  model 
performance.  Predictor  modes  are  in  the  order  selected. 
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discrimination  among  times  to  recurvaturc  (R-score),  but  the  overall  classification  skill 
(1-score)  is  less  than  without  contrasts  (line  1).  Contrasts  designed  with  the  additional 
goal  of  improving  the  ability  of  the  model  to  correctly  forecast  the  time  to  recurvature 
(lines  4  and  5)  result  in  improvement  in  the  time  accuracy  of  recurver  forecasts  (R-score) 
and  in  the  ability  to  recognize  recurver  situations  (%R  -  82  and  85,  respectively). 
However,  this  is  at  the  expense  of  weaker  discrimination  of  straight-movers  (%S  *  67 
and  64,  respectively). 

Except  for  EOF  Mode  1,  the  modes  selected  and  the  order  of  selection  vary  for 
the  various  contrasts  tested  in  Table  15.  Higher  mode  predictors  are  selected  earlier  in 
those  analysis  models  with  contrasts  that  are  designed  to  increase  differences  among 
multiple  groups  (lines  3-5)  than  in  the  analysis  model  using  a  simpler  contrast  between 
only  two  classification  groups  (line  2). 

In  summary,  specifications  of  different  contrasts  as  in  Table  15  do  lead  to  im¬ 
proved  recurvature-related  scores  or  straight-mover  scores  relative  to  a  discriminant 
analysis  model  without  contrasts.  However,  both  scores  are  not  improved  simultane¬ 
ously.  Due  to  the  complexities  of  a  ten-group  model,  the  changes  in  predictors  and 
forecast  skill  produced  by  these  contrasts  are  difficult  to  interpret.  As  in  the  specifica¬ 
tion  of  prior  probabilities,  this  analysis  feature  may  be  better  utilized  to  fine  tune  an 
EOF  forecast  model  than  to  demonstrate  the  usefulness  of  EOF's  in  identifying  recur¬ 
vature  situations.  Since  the  forecast  skill  without  the  use  of  contrasts  (line  1)  is  coin- 
parable  to  with  contrasts  (lines  2  through  5),  the  contrasts  feature  will  not  be  used 
further  in  this  study. 
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3.  Direct,  hierarchical  and  stepwise  selection  of  predictors 

Discriminant  analysis  model  performance  depends  primarily  on  the 
discriminatory  power  of  the  predictor  variables.  When  many  potential  predictors  are 
available,  such  as  the  first  45  EOF  coefficients  for  synoptic  vorticity  considered  in  this 
study,  the  question  is  which  combination  of  predictors  will  produce  the  best  distinction 
among  classification  groups  and  in  what  order  they  should  enter  the  analysis. 

Three  options  of  selecting  and  entering  predictors  in  the  discriminant  analysis 
are  the  direct,  hierarchical  and  stepwise  methods.  In  the  direct  method,  the  predictors 
are  selected  by  the  user  and  all  are  entered  into  the  analysis  in  one  step.  In  the  hierar¬ 
chical  method,  the  user  specifies  both  the  predictors  and  the  order  they  enter  the  analy¬ 
sis.  The  stepwise  method  relies  on  statistical  criteria  specified  by  the  user  to  select  the 
predictors  and  determine  their  order  of  entry. 

The  direct  and  hierarchical  discriminant  analysis  methods  are  advantageous  be¬ 
cause  they  allow  the  user  to  control  the  predictors  in  the  analysis.  However,  they  require 
prior  knowledge  of  the  relative  discriminatory  value  of  each  potential  predictor.  Except 
for  EOF  Mode  1,  which  is  shown  in  Section  II.A.2  to  represent  a  straight-mover  situ¬ 
ation  with  the  storm  in  the  monsoon  trough  or  a  recurvature  situation  depending  upon 
the  value  of  the  coefficient,  little  can  be  inferred  about  the  potential  discriminator)’ 
power  of  the  increasingly  complex  patterns  for  the  EOF  coefficients.  Therefore,  a  step¬ 
wise  discriminant  analysis  will  be  used  to  select  the  most  significant  predictors. 

D.  FINAL  MODEL  DEVELOPMENT 

The  final  model  to  predict  time  to  recurvature  with  12-h  resolution  is  developed  with 
a  stepwise  analysis  of  the  entire  sample  population.  Potential  predictors  are  selected 
from  the  first  45  EOF  coefficients  representing  the  250  mb  synoptic  vorticity  fields.  No 
prior  probabilities  and  no  contrasts  are  specified. 

The  final  question  is  the  criteria  to  limit  the  number  of  predictors  selected  in  the 
stepwise  analysis.  Mathematically,  the  maximum  number  of  predictors  is  equal  to  the 
total  number  of  cases  in  the  sample  minus  two  (Klecka  1980).  Although  all  45  EOF 
coefficients  could  be  used  to  develop  a  discriminant  analysis  model  for  these  data,  the 
objective  is  to  obtain  the  best  classification  skill  with  the  fewest  possible  predictors. 
Such  a  parsimonious  solution  is  sought  by  performing  ten  stepwise  discriminant  analyses 
(Table  16).  The  first  analysis  is  restricted  to  one  mode,  the  second  is  restricted  to  two 
modes,  and  so  on  until  ten  modes  are  selected  in  the  tenth  analysis  model.  Low  F-to- 
enter  (1.0)  and  F-to-rcmove  (0.996)  values  are  specified  in  the  analysis  procedure  to 
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ensure  the  selection  of  up  to  ten  predictors.  However,  the  F-to-remove  values  (not 
shown)  for  the  modes  selected  in  the  analyses  in  Table  16  indicate  that  each  of  the 
selected  modes  is  significant  to  the  separation  of  recurver  groups  at  the  0.01  level  or 
better. 


Table  16.  STEPWISE  SELECTION  OF  ONE  TO  TEN  MODES:  Classification 
skill  in  terms  of  I-score,  R-score  and  percent  correctly  classified  recurver 
(R-OOh  to  R-72h)  or  straight  (R-84h  to  PRNR)  for  ten  discriminant  an¬ 
alyses  that  are  limited  to  one  to  ten  modes  successively.  Jackknifed  re¬ 
sults  (discussed  in  Section  1V.B.1)  reflect  the  skill  expected  with 
independent  testing. 
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An  optimal  forecast  model  is  selected  from  Table  16  by  examining  gains  in  classi¬ 
fication  skill  as  the  number  of  modes  in  the  analysis  is  increased  from  one  to  ten.  As 
expected,  the  jackknifed  results  in  columns  six  through  ten  are  worse  than  the  learning 
model  results  in  columns  one  through  five.  The  general  trend  is  toward  improved  skill 
(smaller  penalty  scores  and  higher  percent  correct  classifications)  as  the  number  of 
modes  increases  from  one  to  ten.  The  seven-mode  discriminant  analysis  model  best 
meets  the  analysis  objectives  because  the  addition  of  Mode  24  as  the  seventh  mode  im¬ 
proves  all  measures  of  classification  skill  relative  to  the  skill  for  the  models  with  six  or 
fewer  modes.  However,  the  addition  of  Mode  45  as  the  eighth  mode  results  in  degra¬ 
dation  in  all  measures  of  skill.  The  addition  of  Mode  9  in  the  nine-mode  model  produces 
only  a  slight  improvement  over  the  eight-mode  analysis  and  only  results  in  skill  scores 
nearly  equal  to  those  for  the  seven-mode  analysis.  Some  improvement  is  again  noted 
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by  the  addition  of  a  tenth  predictor,  but  the  %R  (77)  is  less  than  the  %R  (80)  for  the 
seven-mode  model.  The  seven-mode  discriminant  analysis  model  is  also  preferable  be¬ 
cause  predominantly  low-numbered  EOF  modes  are  used  in  the  analysis.  Since  these 
lower  modes  represent  large-scale  patterns  and  account  for  a  larger  fraction  of  the  vari¬ 
ance  in  the  synoptic  vorticity  fields  than  the  higher  modes,  the  coefficients  for  the  lower 
mode  EOF's  should  be  better  discriminators  of  recurvature  than  those  for  higher  modes. 
Furthermore,  the  higher  mode  predictors  may  represent  noise  in  the  synoptic  field,  yet 
be  statistically  useful  in  predicting  recurvature  in  this  data  set.  Based  on  these  consid¬ 
erations,  the  seven-mode  model  in  Table  16  is  chosen  to  demonstrate  the  potential  of 
discriminant  analysis  of  the  EOF  representation  of  synoptic  vorticity  fields  to  forecast 
time  to  recurvature  with  12-h  resolution. 

E.  FINAL  MODEL  EVALUATION 

The  discriminant  analysis  model  derived  in  Section  IV.D  from  the  stepwise  selection 
of  seven  EOF  coefficients  of  250  mb  vorticity  is  indicative  of  the  forecast  skill  obtainable 
with  this  analysis  method  and  these  data.  In  this  section,  the  final  discriminant  analysis 
model  is  evaluated  and  compared  to  the  Euclidean  distance  model  derived  in  Section 
11I.B.  Since  both  the  discriminant  analysis  and  Euclidean  distance  models  were  derived 
from  the  250  mb  vorticity  data,  the  comparison  is  between  the  two  analysis  methods. 

1.  Forecast  skill 

The  classification  matrix  for  the  final  discriminant  analysis  model  is  presented 
in  Table  17.  The  model  correctly  identifies  synoptic  situations  that  will  lead  to  recur¬ 
vature  within  the  next  72-h  forecast  period  with  80%  accuracy.  Skill  for  straight-track 
situations  is  66%.  Thus,  there  is  a  greater  chance  of  a  false  alarm  of  recurvature  than 
a  missed  rccurvature  prediction.  The  combined  skill  in  predicting  track  type  is  72%. 
Group  classification  skill  is  best  near  recurvature  (R-00h  *  60%,  R-12h  *=  29%,  and 
R-24h  =  29%)  and  in  the  straight-track  categories  (R-96h  *  29%  and  PRNR  *  47%). 
Skill  in  the  intermediate  categories  only  ranges  from  7-22%.  This  result  was  anticipated 
based  on  initial  testing  with  the  ten-group  discriminant  analysis  model  in  Section  IV.B.3 
(Table  14).  The  ten-group  model  in  Section  1V.B.3  is  the  eleven-mode  model  that  would 
have  been  included  in  Table  16  if  the  stepwise  selection  of  one  to  ten  modes  had  been 
carried  one  step  further.  It  was  derived  using  the  same  analysis  options  from  the  step- 
w'ise  selection  of  the  same  seven  modes  in  the  final  discriminant  analysis  model  plus 
Modes  45,  9,  14  and  41.  Canonical  discriminant  functions  for  the  final  discriminant 
analysis  model  (not  shown)  are  nearly  identical  to  those  in  Fig.  18  for  the  ten-group 
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model  and  show  relatively  little  separation  among  the  centroids  for  the  intermediate 
R-86h  through  R-36h  groups. 


Table  17.  CLASSIFICATION  MATRIX  FOR  THE  TEN-GROUP 
(SEVEN-MODE)  MODEL:  Classifications  for  data  in  each  12-h  verifi¬ 
cation  category  and  the  percent  correctly  forecast  by  the  ten-group 
(seven-mode)  discriminant  analysis  model.  Percent  of  recurvers  and 
straight-movers  correctly  predicted  is  also  listed. 
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Bar  charts  (Fig.  19)  of  the  percent  of  cases  in  each  12-h  verification  category 
that  are  classified  into  each  group  further  illustrate  the  relatively  poor  ability  of  the 
model  to  pinpoint  the  time  to  rccurvature  among  the  R-84h  through  R-36h  cases.  The 
intermediate  12-h  categories  not  shown  in  Fig.  19  tend  to  have  characteristics  interme¬ 
diate  to  the  24-h  bar  charts.  Cases  belonging  to  the  better  separated  groups  (R-OOh, 
R-24h,  R-96h,  and  PRNR  in  Fig.  19)  are  more  frequently  correctly  classified  or  classified 
within  one  to  two  12-h  groups  of  the  correct  classification  group  than  those  belonging 
to  the  less  separated  groups  (R-48h  and  R-72h  in  Fig.  19).  The  R-36h  (not  shown),  and 
to  a  lesser  extent  the  R-48h  through  R-72h  cases,  are  classified  into  all  groups  with 
nearly  the  same  frequency,  which  reflects  little  ability  to  correctly  distinguish  the  time 
to  recurvature  for  these  synoptic  situations.  Notice  that  only  28%  (30%)  of  the  R-48h 
(R-36h)  cases  were  misclassified  into  straight-track  groups. 

2.  Additional  model  output  to  assist  the  forecaster 

The  discriminant  analysis  model  classifies  an  individual  case  into  the  group  that 
has  the  highest  classification  function  score  (discussed  in  Section  IV.A).  Discriminant 
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Fig.  19.  Classification  bar  charts  at  24-h  intervals  for  the  ten-group  (seven-mode) 
model.  Percent  of  N  cases  (ordinate)  verifying  as  R-OOh  (top  left),  R-24h  (top 
right),  R-48h  (middle  left),  R-72h  (middle  right),  R-96h  (bottom  left),  and  PRNR 
(bottom  right)  that  are  classified  into  each  group  R-OOh  through  PRNR  (abscissa). 
Shaded  bars  indicate  the  percent  in  the  correctly  classified  category. 
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analysis  provides  additional  information  that  may  assist  the  forecaster  in  subjectively 
assessing  the  validity  of  a  model  forecast.  These  outputs  are  the  Mahalanobis  distances 
and  the  posterior  probabilities. 

The  Mahalanobis  distance  (I)3)  is  the  squared  distance  of  an  individual  case  to 
each  group  centroid.  Since  D3  has  the  same  properties  as  the  chi-squared  (*:)  statistic 
with  degrees  of  freedom  (df)  equal  to  the  number  of  predictors,  Mahalanobis  distances 
are  measured  in  chi-square  units. 

The  posterior  probability  is  the  probability  that  an  individual  case  belongs  to  a 
group,  which  is  calculated  from  D3  by  assuming  the  cases  in  each  group  are  clustered 
around  the  centroid  in  a  multivariate  normal  distribution  and  that  every  case  belongs  to 
one  of  the  groups.  Posterior  probabilities  are  more  useful  to  the  forecaster  because  a 
set  of  nearly  equal  (small)  percentage  values  for  the  time  categories  indicates  the  likely 
uncertainty  in  time  to  recurvature.  Because  posterior  probabilites  are  used  subjectively 
in  this  study,  their  contribution  to  forecast  skill  will  not  be  evaluated  in  this  section. 

3.  Comparison  of  discriminant  analysis  and  Euclidean  distance  models 

Classification  skill  for  the  final  ten-group  discriminant  analysis  model  and  the 
Euclidean  distance  model  is  compared  in  Table  18.  Although  the  learning  data  sets  dif¬ 
fer,  the  skill  scores  reflect  the  ability  of  each  model  to  forecast  all  782  cases.  As  ex¬ 
pected,  the  discriminant  analysis  model  (line  1)  outperforms  the  Euclidean  distance 
model  in  all  areas  except  %S  (discriminant  analysis  =  66,  Euclidean  distance  =  68). 
Because  the  learning  set  for  the  Euclidean  distance  model  is  comprised  of  only  161  cases, 
the  results  for  this  model  (line  3)  are  predominantly  independent  test  results.  Thus,  a 
more  equitable  comparison  is  between  the  jackknifed  results  for  the  discriminant  analysis 
(line  2)  and  the  Euclidean  distance  model.  In  this  comparison,  the  discriminant  analysis 
model  still  outperforms  the  Euclidean  distance  model  in  all  areas  except  %S  (jackknifed 
discriminant  analysis  *  65).  The  conclusion  that  discriminant  analysis  is  a  better 
method  for  exploiting  the  predictive  capability  of  the  EOF  coefficients  is  based  on  rela¬ 
tive  performance  of  the  two  forecast  models.  The  Euclidean  distance  method  is  an  in¬ 
tuitive,  and  thus  more  subjective,  method  of  forecasting  tropical  cyclone  recurvature 
using  EOF  predictors  of  synoptic  vorticity.  Discriminant  analysis  is  a  statistically-based, 
and  thus  more  objective,  method  for  classifying  the  time  to  recurvature.  Both  analyses 
demonstrate  skiii  compared  to  the  climatological  model  (line  4)  in  which  predictions  are 
based  on  the  relative  historical  frequency  of  occurrence  of  each  12-h  classification  group 
in  straight  and  recurving  best  track  data. 
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Table  18.  COMPARISON  OF  MODEL  FORECAST  SKILL:  Classification  skill 
in  terms  of  1-score,  R-scerc  ami  percent  correctly  classified  recurver 
(R-OOh  to  R-72h)  or  straight  (R-84h  to  PRNR)  for  the  final  ten-group 
discriminant  analysis  model  and  the  Euclidean  distance  model  based  on 
250  mb  vorticity  EOF  coefficients  and  climatological  forecasts  based  on 
1979-1984  best  track  data.  Discriminant  analysis  jackknifed  results  (dis¬ 
cussed  in  Section  1V.B.1)  reflect  the  skill  expected  with  independent  test¬ 
ing. 
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F.  VIOLATION  OF  ASSUMPTIONS 

Although  discriminant  analysis  is  a  robust  procedure,  the  analysis  results  may  be 
adversely  affected  by  the  violation  of  the  requirements  or  assumptions  to  apply  the 
method: 

1 .  two  or  more  distinct  groups  must  be  specified; 

2.  at  least  two  cases  must  be  present  in  each  group; 

3.  the  number  of  discriminating  variables  must  be  less  than  the  total  number  of  cases 
minus  two; 

4.  the  discriminating  variables  must  be  measured  such  that  the  differences  between 
successive  values  are  always  the  same; 

5.  the  discriminating  variables  must  not  be  a  linear  combination  of  the  other  dis¬ 
criminating  variables; 

6.  the  variance-covariance  matrices  must  be  approximately  equal  for  each  group;  and 

7.  the  group  distributions  must  be  multivariate  normal. 

Effects  of  violating  the  seven  discriminant  analyr'>  assumptions  are  explained  in  detail 
in  Klccka  (1980),  who  notes  that  the  best  guide  for  a  prediction  model  is  the  percentage 
of  correct  classifications.  If  the  percentage  is  high,  any  violations  v.ere  not  harmful.  If 
the  percentage  is  low,  it  could  be  due  to  the  violation  of  the  assumptions  or  weak  dis¬ 
criminating  variables. 
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The  first  four  assumptions  are  met  by  the  data  in  this  study.  The  BMDP7M  pro¬ 
gram  incorporates  tolerance  criteria  in  the  stepwise  selection  of  discriminating  variables 
that  protect  against  violations  of  multicollinearity  (fifth  assumption).  Homogeneity  of 
the  variance-covariance  matrices  (sixth  assumption)  is  more  important  in  classification 
than  in  statistical  inference.  Cases  tend  to  be  over-classified  into  more  disperse  groups. 
Homogeneity  of  the  variance-covariance  matrices  is  tested  by  examination  of  the  group 
standard  deviations  for  each  predictor  and  by  inspection  of  the  scatter  plots  of  the  first 
two  canonical  function  scores  for  the  cases  in  each  group.  The  ten  group  standard  de¬ 
viations  have  no  gross  discrepancies  in  predictor  variance.  The  largest  differences  in  the 
variances  are  observed  in  Mode  1,  and  range  from  39.2  for  the  R-OOh  group  tc  15.4  for 
the  R-96h  group.  The  canonical  discriminant  function  scatter  plots  for  each  group  (not 
shown)  have  roughly  equal  dispersion,  which  indicates  that  the  variance-covariance 
matrices  are  approximately  homogeneous. 

Testing  the  multivariate  normality  (seventh  assumption)  of  all  linear  combinations 
of  the  sample  predictors  is  not  currently  feasible  (Tabachnick  and  Fidell  1989).  How¬ 
ever,  discriminant  analysis  is  robust  to  violations  of  normality  if  they  are  caused  by 
skewness  rather  than  outliers.  To  test  for  outliers,  the  Mahalanobis  distance  from  each 
group  centroid  to  its  member  cases  is  evaluated  as  y}  with  degrees  of  freedom  equal  to 
the  number  of  predictors.  Only  three  of  the  782  cases  in  the  sample  population  (Table 
19)  exceed  the  critical  y}  —  24.32  at  a.  -  O.OOl  with  seven  df.  These  three  outliers  arc  from 
recurving  storms  at  R-OOh  or  R-12h  and  two  of  them  are  from  the  Euclidean  distance 
clean  set  storms  (TV  Vernon  and  ST  Forrest).  Eliminating  the  three  multivariate 
outliers  from  the  discriminant  analysis  (not  shown)  does  not  appreciably  change  the 
classification  accuracy  for  the  ten  12-h  groups,  regardless  of  whether  the  same  seven 
predictors  are  hierarchically  entered  into  the  analysis  or  seven  new  predictors  are  selected 
in  a  stepwise  fashion.  However,  the  exclusion  of  these  three  cases  from  the  sample 
population  causes  subtle  changes  in  the  F-to-enter  statistics  for  each  predictor.  For  ex¬ 
ample,  the  first  seven  modes  selected  in  the  stepwise  analysis  are  Modes  1,  2,  3,  5,  4, 45, 
and  6  instead  of  Modes  1,  2,  3,  5,  4,  6,  and  24.  Such  multivariate  outliers  should  be 
eliminated  from  the  analysis  to  develop  an  operational  forecast  model.  Since  one  goal 
of  this  study  is  to  compare  the  classification  skill  for  the  discriminant  analysis  model 
with  the  Euclidean  distance  model  that  was  derived  using  two  of  these  cases,  the 
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not  excluded  from  the  fiuai  discriminant  analysis  model. 


Kachigan  (1982)  questioned  whether  discriminant  analysis  is  an  appropriate  analysis 


technique  for  the  dichotomization  of  a  continuous  criterion  variable,  such  as  time  to 
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Table  19.  MULTIVARIATE  OUTLIERS:  Cases  for  which  the  Mahalanobis  dis¬ 
tance  to  the  group  centroid  exceeds  the  critical  x J  value  of  24.3  for  the 
final  discriminant  analysis  model. 


STORM 

NO/YR 

STORM 

NAME 

VERIFICATION 

CATEGORY 

MODEL 

FORECAST 

MAHALANOBIS  DISTANCE  TO 
VERIFICATION  GROUP  CENTROID 

1679 

TV  LOLA 

2260 

TY  VERNON 

.  O  f V,  V9H9K 

1163 

ST  FORREST 

R-00H 

41.1 

recurvature  in  this  study.  A -though  the  recurvature  and  non-recurvature  samples  rep¬ 
resent  distinct  sets,  the  synoptic  situations  that  lead  to  recurvature  do  evolve  contin¬ 
uously  in  time  and  thus  may  not  be  easily  distinguished.  Regression  analysis  may  be  a 
more  powerful  and  efficient  analysis  procedure  since  the  regression  method  would  fully 
utilize  the  time  resolution  cf  the  observed  data  and  the  time  trends  in  the  EOF  predictors 
to  model  the  time  to  recurvature  as  a  continuous  variable.  The  ten-group  discriminant 
analysis  model  also  makes  full  use  of  the  time  resolution  of  the  data,  but  without  any 
data  transformations  that  might  be  required  to  meet  the  linearity  assumptions  of  the 
regression  model.  Thus,  discriminant  analysis  provides  an  efficient  first  look  at  the 
ability  of  EOF  coefficients  of  synoptic  vorticity  to  predict  time  to  recurvature. 
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V.  FORECAST  EXAMPLES 


Operational  application  of  a  discriminant  analysis  model  using  EOF  predictors  to 
forecast  tropical  cyclone  recurvature  is  relatively  simple.  Only  a  personal  computer  or 
programmable  calculator  would  be  required  to  interpolate  the  analyzed  wind  fields  onto 
the  storm-centered  grid,  compute  the  vorticity  at  the  gridpoints,  calculate  the  EOF 
eigenvalues  corresponding  to  the  vorticity  field  and  to  solve  for  the  classification  func¬ 
tion  scores  and  posterior  probabilities  for  each  of  the  model's  classification  groups.  In 
this  section,  forecast  examples  from  the  learning  set  are  presented.  The  use  of  posterior 
probabilities  to  assess  the  validity  of  a  forecast  is  also  discussed. 

A.  TEST  CASES 

The  final  discriminant  analysis  model  forecasts  are  presented  for  the  1984  examples 
(Fig.  1)  of  a  recurvcr  (ST  Vanessa),  a  straight-mover  (TY  Agnes)  and  an  odd-mover  (ST 
Bill).  The  forecast  skill  for  these  three  storms  is  typical  of  other  storms  in  the  data  set. 

1.  Recurver 

The  final  discriminant  analysis  model  forecasts  of  the  lime  to  recurvature  for 
ST  Vanessa  are  shown  in  Fig.  20.  ST  Vanessa  tracked  along  the  southern  side  of  the 
subtropical  ridge,  which  had  redeveloped  in  the  wake  of  TY  Tad,  for  nearly  five  days 
before  recurving  (ATCR  19S4).  Only  the  two  discriminant  analysis  model  forecasts  of 
R-72h  for  times  greater  than  96  h  before  recurvature  are  clearly  erroneous.  All  forecasts 
within  72  h  of  recurvature  are  correct  predictions  of  a  rccurver-track  type.  Although 
only  three  of  the  seven  recurver-track  type  forecasts  are  correct  (R-OOh,  R-48h  and 
R-72h),  the  forecasts  all  progress  in  a  sequential  manner  toward  rccurvature  (R-72h, 
R-72h.  R-48h,  R*4Sh,  R-36h,  R-36h,  R-OOh).  The  12-h  forecast  sequences  for  most  of 
the  recurving  storms  in  the  sample  have  a  similar  progression.  Although  a  prediction 
may  be  repeated  at  successive  12-h  forecasts  and  one  or  more  sequential  classification 
groups  may  be  skipped  between  successive  forecasts,  the  predictions  tend  correctly  to¬ 
ward  recurvature.  Such  a  consistent  trend  toward  recurvature  in  successive  operational 
forecasts  would  add  confidence  to  the  individual  12-h  recurving-track  forecasts. 

2.  Straight-mover 

Tiie  final  discriminant  analysis  model  forecasts  for  TY  Agnes  arc  presented  in 
Fig.  21.  TY  Agnes  tracked  west-northwest  under  the  influence  of  an  easterly  steering 
flow  along  the  south  side  of  a  broad  mid-  to  low-lcvcl  subtropical  ridge  that  extended 
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analysis  model  forecasts  (top  number)  and  verifying  time  (bottom)  to  recurvature  (h) 
at  the  JTWC  best  track  00  and  12  UTC  positions  during  22-31  October  1984  (dots). 
The  letters  PR  indicate  a  pre-recurvature  situation  of  more  than  96  h  prior  to  re¬ 
curvature. 

from  the  dateline  west  to  the  coast  of  Vietnam  (ATCR  1984).  Seven  of  the  nine  fore¬ 
casts  correctly  predicted  straight-track  motion  during  the  72-h  forecast  period.  Two 
forecasts  of  recurvature  in  60  h  are  mispredictions  of  the  track  type.  These  two  R-60h 
forecasts  are  48  and  60  h  (72  and  84  h)  before  landfall  in  Vietnam  and  subsequent 
dissipation. 

3.  Odd-mover 

The  forecast  model  in  this  study  was  not  designed  to  distinguish  odd-mover  be¬ 
havior  such  as  loops  and  stairstep  tracks.  Therefore,  forecasts  based  on  the  vorticity 
fields  preceding  or  during  erratic  motion  cannot  provide  accurate  information  on  the 
storm's  track.  However,  classifications  may  indicate  storm  motion  if  the  next  segment 
of  the  track  fits  either  of  the  model's  straight  or  recurver  track  categories. 

The  time-to-recurvature  forecasts  for  ST  Bill  are  shown  in  Fig.  22.  Although 
ST  Bill  was  expected  to  recurve  similar  to  ST  Vanessa,  the  complex  environmental 
steering  associated  with  an  interaction  with  TY  Clara  caused  Bill  to  track  southeastward 
before  dissipating  east  of  the  Philippines  (ATCR  1984).  In  the  first  48  h  after  the 
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Fig.  21.  Time-to-recurvature  forecasts  for  TY  Agnes,  Discriminant  analysis 
model  forecasts  of  time  to  recurvature  (h)  at  the  JTWC  best  track  00  and  12  IJTC 
positions  (dots)  during  1-8  November  1984.  Definition  as  a  straight-moving  storm 
requires  a  minimum  of  72  h  after  the  forecast  time  to  ensure  verification  as  a 
straight-mover.  PRNR  refers  to  the  forecast  model  classification  group  for  recurv¬ 
ing  cases  more  than  96  h  prior  to  recurvature  time  and  straight-moving  cases. 

tropical  cyclone  formation  alert  (TCFA),  Bill  tracked  slowly  in  a  25  n  mi  (46  km) 
diameter  cyclonic  loop.  Although  the  next  track  segment  is  straight,  forecasts  during 
Bill's  first  loop  predict  rccurvature  in  60  h  (first  forecast)  to  72  h  (second  through  fourth 
forecasts).  Once  the  erratic  looping  is  completed,  the  model  correctly  identifies  the 
straight-track  segment  in  the  next  eight  forecasts.  As  Bill  began  to  recurve  around  the 
western  end  of  the  subtropical  ridge,  the  midlatitude  trough  passed  to  the  north  and 
weakened  the  ridge,  which  slowed  Bill's  progress.  The  intense  low-level  circulation  in  the 
Philippine  Sea  associated  with  TY  Clara,  and  the  strengthening  northeast  monsoon  flow, 
forced  Bill  to  the  southeast  in  an  anticyclonic  loop,  and  Bill  rapidly  weakened, 

This  set  of  model  forecasts  is  unusual  in  that  there  is  a  sudden  transition  from 
straight-track  predictions  (R-96h)  to  the  recurver  predictions  (R-24h,  R-12h  and  R-OOh). 
The  model  forecasts  fur  recurving  storms  tend  to  transition  more  appropriately  through 
successive  recurvature  classification  categories.  The  model  classifies  the  vorticity  fields 
during  the  anticyclonic  loop  as  recurvature  (R-OOh)  situations.  Since  the  forecast  model 
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is  unable  to  predict  looping  or  southeast  motion,  synoptic  situations  after  Bill'r  ..ould-be 
recurvature  time  (fifth  R-OOh  forecast)  are  classified  into  the  most  similar  of  the  ten 
straight  plus  recurver  groups.  While  these  recurvature  forecasts  correctly  predict  the 
recurvature-like  motion  as  Bill  moves  northwest  and  then  north  and  northeast,  there  is 
no  indication  in  the  model  forecasts  that  the  Bill  will  subsequently  loop  toward  the 
southeast.  The  last  two  forecasts  of  R-12h  and  R-36h  are  based  on  the  synoptic  situ¬ 
ation  associated  with  Bill's  southeast  motion  and  precede  a  small  cyclonic  loop.  These 
last  forecasts  indicate  that  the  situation  has  changed,  but  continue  erroneously  to  predict 
recurvature. 
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forecasts  of  time  to  rccurvature  (h)  arc  indicated  at  the  JTWC  best  track  00  and  12 
UTC  positions  (dots)  during  8*21  November  1984. 


B.  POSTERIOR  PROBABILITIES  AS  AN  AID  IN  THE  FORECAST  DECISION 
The  posterior  probability  is  the  probability  that  an  individual  case  belongs  to  a 
group.  The  probabilities  for  all  groups  sum  to  one.  The  posterior  probability  (P)  that 


case  /  belongs  to  group  j  is  computed  from  the  Mahalanobis  Distance  (Z)J)  or  directly 
from  the  classification  function  score  (S)  for  the  ith  case  for  the  jih  group: 


exp  (■%) 

8 

YsCXP(S‘k) 


(5.1) 


Posterior  probabilities  can  be  used  subjectively  by  the  forecaster  to  assess  the  likeli¬ 
hood  that  a  classification  is  correct.  If  the  posterior  probability  for  one  classification 
group  is  high  relative  to  the  probabilities  for  the  remaining  groups,  the  forecaster  can 
have  more  confidence  that  the  model  forecast  is  correct.  If  the  posterior  probability  for 
the  classification  group  is  low  and  nearly  equal  to  the  probabilities  for  one  or  more  of 
the  other  groups,  then  the  forecaster  should  have  less  confidence  in  the  prediction. 
Posterior  probabilities  can  also  be  useful  when  a  classification  is  repeated  at  successive 
12-h  forecasts  to  indicate  whether  the  forecast  is  more  or  less  likely  to  be  correct. 

The  posterior  probability  would  be  more  useful  if  some  cutoff  value  existed  that 
would  indicate  the  forecast  was  likely  to  be  correct.  To  examine  whether  this  is  the  case 
for  the  discriminant  analysis  model,  posterior  probabilities  for  all  cases  classified  into 
each  12-h  forecast  category  arc  plotted  as  a  function  of  the  actual  verification  categories 
(Tig.  23).  The  ranges  of  the  posterior  probabilities  vary  with  the  forecast  classification 
group.  Probabilities  are  highest  for  the  R-OOh  and  PRNR  forecasts  and  are  lowest  for 
the  R-36h  through  R-84h  forecasts.  Unfortunately,  the  posterior  probabilities  are  not 
distinctly  higher  for  the  correct  predictions  than  for  the  incorrect  predictions.  Posterior 
probabilities  for  correct  classifications  are  most  distinct  from  incorrect  classifications 
when  PRNR  is  forecast.  Therefore,  posterior  probabilities  are  most  useful  in  evaluating 
TRN'R  forecasts. 

Posterior  probabilities  for  recurving  storm  ST  Abby  are  presented  in  Table  20.  ST 
Abby  continually  tracked  to  the  right  of  the  1983  JTWC  official  forecasts  (ATCR  1983). 
Although  the  JTWC  forecast  aids  and  numerical  progs  had  consistently  indicated  a 
west-northwest  track  for  Abby,  the  subtropical  ridge  over  Japan  never  intensified  as 
anticipated  and  Abby  recurved  to  the  northeast.  Sandgathe  (1987)  cites  ST  Abby  as  an 
unusual  example  of  a  cyclone-subtropical  ridge  interaction,  defined  as  a  "through-the* 
ridge"  case,  in  which  the  cyclone  unexpectedly  moves  through  an  apparently 
well-established  subtropical  ridge. 
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Fig.  23.  Posterior  probabilities  of  classifications  into  time-to-recurvature 
groups.  Posterior  probabilities  (ordinate)  for  the  N  cases  forecast  as  R-OOh  (top 
left),  R-24h  (top  right),  R*48h  (middle  left),  R*72h  (middle  right),  R-96h  (bottom 
left),  and  PRNR  (bottom  right)  plotted  in  the  verifying  groups  R*00h  through 
PRNR  (abscissa).  Vertical  lines  indicate  the  correct  classifications. 
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On  5  August  1983,  the  discriminant  analysis  model  correctly  forecasts  Abby's 
straight-track  motion  during  the  next  72  h  and  the  posterior  probability  (35%)  is  rela¬ 
tively  high.  Referring  to  Fig.  23,  only  one  case  in  the  learning  set  had  a  PRNR  forecast 
with  a  posterior  probability  greater  than  35%  and  then  recurved  within  the  72-h  forecast 
period  (R-12h).  Therefore,  the  35%  posterior  probability  indicates  that  it  is  highly  likely 
that  the  PRNR  forecast  is  correct.  Similarly,  the  second  PRNR  forecast  has  a  relatively 
high  posterior  probability  for  the  PRNR  classification,  which  indicates  the  reliability  of 
the  PRNR  forecast.  Although  the  third  PRNR  forecast  correctly  predicts  straight-track 
motion  during  the  next  72-h  period,  the  posterior  probability  that  it  belongs  to  that 
group  is  only  21%.  Based  on  the  learning  set  results  in  Fig.  23,  a  forecaster  would  have 
relatively  less  confidence  in  this  PRNR  forecast  (line  3)  than  in  the  previous  two  PRNR 
forecasts  (lines  1  and  2).  However,  the  model  continues  to  predict  Abby  as  a  straight- 
mover  (or  at  least  84  h  to  recurvature)  throughout  the  remainder  of  the  recurvature  pe¬ 
riod.  The  small  posterior  probability  values  indicate  that  the  erroneous  straight-track 
PRNR  predictions  are  not  likely  to  be  correct. 


Table  20.  DISCRIMINANT  ANALYSIS  MODEL  FORECASTS  TOR  ST 
ABBY:  Month-day-times  from  5*9  August  1983  are  indicated  in  the 
D  I  G  column.  Verification  times  to  recurvature  are  given  in  the  VERF 
column.  The  prediction  of  the  most  likely  classification  group  (time  to 
recurvature  in  hours  or  PRNR)  is  based  on  the  highest  classification 
function  score  and  corresponds  to  the  highest  posterior  probabilities 
given  in  the  columns  labeled  00  through  PRNR. 


MODEL  CLASSIFICATION  CROUP 


DTC 

VERF  PRED 

00 

12 

24 

36 

48 

60 

72 

84 

96 

PRNR 

oaosoo 

PRNR  PRNR 

2 

4 

4 

5 

6 

8 

10 

13 

13 

3fi 

080512 

96 

PRNR 

5 

6 

5 

6 

6 

8 

9 

11 

11 

34 

080600 

84 

PRNR 

14 

10 

6 

S 

6 

9 

10 

10 

9 

21 

080612 

72 

PRNR 

10 

6 

5 

4 

6 

12 

14 

13 

13 

16 

080700 

60 

PRNR 

7 

6 

6 

£ 

7 

14 

15 

14 

12 

IS 

080712 

48 

PRNR 

2 

2 

4 

£ 

9 

14 

14 

17 

14 

19 

080800 

36 

84 

1 

2 

S 

8 

9 

10 

12 

19 

15 

18 

080812 

24 

PRNR 

0 

2 

£ 

7 

7 

8 

11 

21 

17 

21 

080900 

12 

84 

0 

1 

4 

6 

6 

8 

11 

24 

18 

23 

080912 

00 

84 

1 

2 

4 

6 

7 

12 

16 

22 

Ifi 

15 
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VI.  SUMMARY  AND  CONCLUSIONS 


The  feasibility  of  using  an  empirical  orthogonal  function  (EOF)  representation  to 
identify  the  synoptic  vorticity  associated  with  tropical  cyclone  recurvature  is  examined. 
Recurvature,  which  is  defined  as  a  change  in  storm  heading  from  west  to  east  of  000  * 
N,  is  evaluated  from  the  Joint  Typhoon  Warning  Center  best  track  positions.  In  this 
EOF  approach,  the  vorticity  field  is  represented  by  the  sum  of  45  orthogonal 
eigenvectors  that  represent  spatial  patterns.  Time-dependent  coefficients  are  derived 
that  indicate  the  importance  of  each  pattern  in  the  map  series.  The  EOF  coefficients  are 
derived  by  Gunzelman  (1990)  from  the  12-hourly  U.S.  Navy  Global  Band  Analyses  at 
700,  400  and  250  mb  for  1979-1984  western  North  Pacific  tropical  cyclones.  The  first 
45  modes  account  for  73-78%  of  the  variance  in  the  relative  vorticity  fields. 

The  classification  goals  are  two-fold:  first,  to  identify  tropical  cyclone  motion  during 
the  72-h  forecast  period  as  either  straight  or  recurving;  and  second,  to  forecast  the  time 
to  recurvaturc  with  12-h  accuracy.  The  time  series  of  the  first  and  second  EOF  coeffi¬ 
cients  for  recurving  storms  vary  in  a  systematic  manner  as  the  tropical  cyclone  moves 
around  the  subtropical  ridge.  In  contrast,  the  coefficients  for  straight-moving  storms 
tend  to  cluster  about  different  mean  EOF  1-2  values.  Taking  this  Euclidean  distance 
approach,  additional  EOF  predictors  are  identified  that  best  separate  recurvers  and 
straight-movers  in  multidimensional  EOF  space.  Classification  of  an  individual  case  is 
then  into  the  closest  12-h  time-to-recurvaturc  group  or  straight-mover  category  as 
measured  in  multidimensional  EOF  space.  The  Euclidean  approach  provides  physical 
insight  into  the  classification  problem  and  demonstrates  skill  relative  to  climatological 
forecasts.  However,  there  is  no  objective  method  of  determining  the  optimum  set  of 
predictors  or  weighting  the  individual  predictors  in  the  model  according  to  their  signif¬ 
icance  is  separating  among  the  classification  groups. 

A  more  objective  discriminant  analysis  technique  is  employed  to  more  fully  exploit 
the  predictive  capabilities  of  these  EOF  coefficients.  In  this  approach,  the  entire  set  of 
782  cases  from  97  recurving  and  straight-moving  tropical  cyclones  is  used  to  both  derive 
and  test  the  recurvature  model  classifications.  A  final  250  mb  discriminant  analysis 
model  is  useful  (72%  correct)  in  identifying  recurving  (80%)  and  straight  (66%)  motion 
during  the  72-h  forecast  period.  Skill  in  distinguishing  among  the  12-h  time  to  recurva¬ 
ture  groups  (R-OOh  through  R-96h)  plus  the  combined  straight-mover  and  recurving 
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storm  cases  more  than  96  h  prior  to  recurvature  (PRNR)  is  only  60,  29,  29,  12,  13,  22, 
19,  7,  29,  and  47%,  respectively.  While  these  results  represent  improvement  over  the 
Euclidean  model  forecasts,  the  skill  in  identifying  the  time  to  recurvature  is  less  than 
desired  for  operational  use.  The  relatively  poor  skill  in  classifying  cases  in  the  interme¬ 
diate  time  to  recurvature  categories  is  attributed  to  the  high  variability  among  the 
synoptic  fields  that  precede  recurvature.  Better  skill  (79%  correct)  in  identifying  storm 
motion  during  the  72-h  forecast  period  can  be  achieved  if  classifications  are  only  into 
two  groups  (recurver  versus  straight),  rather  than  into  the  nine  12-h  time-to-recurvature 
groups  plus  PRNR.  Thus,  the  number  and  composition  of  the  classification  groups 
must  be  a  trade-off  between  the  forecaster's  need  to  specify  a  precise  time  of  recurvature 
versus  the  diminishing  skill  as  more  time  precision  is  attempted  in  the  forecast  model. 

The  EOF  coefficients  for  250  mb  vorticity  provide  the  best  time-to-recurvature 
forecast  skill.  The  coefficients  for  this  pressure  level  are  statistically  the  most  distinct 
among  the  time-to-recurvature  groups  and  the  250  mb  eigenvectors  represent  more  var¬ 
iance  in  the  vorticity  fields  than  those  for  the  other  two  pressure  levels.  In  addition,  the 
magnitude  of  the  vorticity  of  the  subtropical  ridge  increases  with  height  and  is  greatest 
at  250  mb.  The  700  mb  coefficients  provide  the  next  best  model  skill,  Although  the 
eigenvectors  for  this  pressure  level  account  for  less  variance  than  those  for  400  mb,  the 
relative  vorticity  gradients  between  the  cyclone  and  the  subtropical  ridge  are  greatest  at 
700  mb.  Since  more  reliable  data  are  available  over  open  ocean  areas  at  the  upper  levels 
from  pilot  reports  and  satellite-derived  winds,  the  individual  12-hourly  cases  should  be 
better  defined  and  better  forecast  at  250  mb. 

Since  no  classification  groups  aie  included  for  odd-mover  motion,  such  as  loops  and 
stairsteps,  these  types  of  tracks  are  forecast  into  the  most  similar  time-to-recurvature 
group.  For  example,  an  anticyclonic  loop  might  be  classified  as  recurvature.  Perhaps 
the  EOF  representation  of  synoptic  vorticity  will  not  be  able  to  identify  the  precise  type 
of  odd-mover  motion  resulting  from  the  smaller  and  faster  time  scale  forcing  mech¬ 
anisms  such  as  multiple  storm  interactions.  Thus,  distinction  between  a  storm  that  will 
merely  step  or  loop  to  the  northeast  and  one  that  will  continue  recurvature  motion  to 
the  northeast  is  needed. 

The  results  from  these  feasibility  tests  indicate  the  usefulness  of  an  EOF  represen¬ 
tation  of  synoptic  vorticity  at  one  pressure  level.  Better  skill  may  be  achieved  if  the  EOF 
coefficients  for  more  than  one  pressure  level  are  used,  or  if  this  EOF  representation  of 
the  synoptic  fields  is  combined  with  other  factors  such  as  persistence  and  climatology. 
Other  analysis  methods,  such  as  multiple  linear  regression,  that  better  exploit  the  time 
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trends  in  continuous  data  of  this  type  should  also  be  tested.  As  more  data  become 
available,  independent  testing  and  stratification  of  the  sample  will  be  possible.  One 
problem  with  this  initial  investigation  is  that  it  is  assumed  that  only  one  set  of  vorticity 
patterns  leads  to  recurvature.  In  r-  ,  several  distinct  paths  may  be  defined  by  the 
time-dependent  coefficients  in  multidimensional  space.  Such  differences  could  be  due  to 
different  forcing  mechanisms  associated  with  recurvature,  or  more  simply  due  to  the 
differences  in  the  large-scale  vorticity  patterns  with  latitude.  While  these  preliminary 
results  in  pinpointing  the  precise  (12-h)  time  to  recurvature  are  somewhat  discouraging, 
other  statistical  techniques  may  prove  more  successful. 
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